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Query: 181 EAAALYYIAAQHNVNAIAMMTI SDNLNNPEEDTSAEEROTTFTDMMKVGLETLI SE 236 

EAAALYYIAAQH V+ALA+MTISD+L NP+EDT+AEERQ TFTDMMKVGLETLI++ 
Sbjct: 181 EAAALYyiAAQHQVDAIAIOTISDSLVNPDEDTTAEER©n"FTDMMKVGIjETLlAD 236 

A related DNA sequence was identified in S.pyogenes <SEQ ID 275 1> which encodes the amino acid 
sequence <SEQ ID 2752>. Analysis of this protein sequence reveals the following: 
Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2117 (Affirmative) < suco 

bacterial membrane — Certainty«0. 0000 (Not Clear) < suco 

bacterial outside — Certainty«0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 
Identities - 210/235 (89%), Positives - 226/235 (95%) 

Query: 1 MS I H I EAKQGE IADKI LLPGDPLRAKF I AENFl^AVCFOTVRNMFGYTGTYKGHRVSVM 60 

MSIHI AK+G+ IADKI LLPGDPLRAKFIAENFLEDAVCFN VRNMFGVTGTYKGHRVSVM 
Sbjct: 1 MSIHISAKKGDIADKILLTODPIJlAKFIAENFliEI^VCFNEvraOTFGYTGTYKGHRVSVM 60 

Query: 61 GTGMGMPSISiyARELIvDYGvXTLIRVGTAGAINPDIH\n^LVLAQAAATNSNIIRNDW 120 

GTGMGMPSISIYARELI VDYGVKTLIRVGTAGAI +P++HVRELV1AQAAATNSNI IRND+ 
Sbjct: 61 GTGMGMPSISIYARELIVDYGVTCTLIRVGTAGAIDPEVHvTlELVLAQAAATNSNI IRNDF 120 

Query: 121 PEPDFPQIADFKLLDKAYHIAKEMDITTHVGSVLSSDVFYSNQPDRNMALGKLGVHAIEM 180 

PEFDFPQIADF LLDKAYHIA+EM +TTHVG+VLSSDVFY+N P+RNMALGKLGV AIEM 
Sbjct: 121 PEFDFPQI ADFGLLDKAYH IAREMGVTTHVGNVL^SDVrOTOMPERNMALGKI/JVKA I EM 180 

Query: 181 EAAAL YYLAAQHNVNALAMMT I SDNLNNPEEDTSAEERQTTFTDMMKVGLETL I S 235 

EAAALYYLAAQH+V AL +MTISDNLN+P EDT+AEERQTTFTDMMKVGLETLI + 
Sbjct: 181 EAAALYYI^QHHVKAl/3IMTISDrrt^PTEr/TT^ 235 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 903 

A DNA sequence (GBSx0958) was identified in S.agalactiae <SEQ ID 2753> which encodes the amino 
acid sequence <SEQ ID 2754>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty«0 . 1710 (Affirmative) < suco 

bacterial membrane — Certainty»0. 0000 (Not Clear) < suco 

bacterial outside — Certainty-0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9881 > which encodes amino acid sequence <SEQ ID 9882> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2755> which encodes the amino acid 
sequence <SEQ ID 2756>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm 

bacterial membrane 

bacterial outside 



Certainty=0. 1386 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 126/253 (49%) , Positives = 175/253 (68%) , Gaps = 2/253 (0%) 

Query: 3 IEMTDFSTALKVLVDQYSYHNAFLLLQKHGPIiNSDLLFLLEMMKERRELNIDFLFAHQEQ 62 

+ MT+ T L +L+D Y+Y++AF + + + L+LLEM+KERRELN+ FL H + 
Sbjct: 1 LPMTNNQT-LDILLDvYAYNHAFRIAKALPNIPKTALYLLEMLKERRELNLAFIiAEHAAE 59 

Query: 63 WILQEKYNIKL-LHNPYDLELLANYII^TOLEAIOTKNGLIIDFVRSVSPILYRLFMILLAQ 121 

++++Y+ 11+ + E +ANYI+DLE KVKNG I IDFVRSVSPILYRLF+ L+ 
Sbjct: 60 NRTIEDQYHCSLWLNQSLEDEQIANYILDLEVKVKNGAIIDFVRSVSPILYRLFLRLITS 119 

Query: 122 EVPHLHDYIHNARDDHYDTWKFKELKESNHPVLLAFSERWHDSRLTSKSLAECLQLTDLD 181 

E+ p + YI + ++D YDTW F+ + ES+H V A+ + +T+KSLA+ L LT L 

Sbjct: 120 EIPNFKAYIFDTKNDQYDTWHFQAMLESDHEVFKAYLSQKQSRNVTTKSLADMLTLTSLP 179 

Query: 182 EEVKSTI IQLRQFEKS VRNPLAHLI KPFDEQELYRTTQFSSQAFLDQI I FLAKVIGVEYD 241 

+E+K + LR FEK+VRNPLAHLIKPFDE+EL+RTT FSSQAFL+ II LA GV Y 
Sbjcf: 180 QEIKDLVFLLRHFEKAVRNPLAHLIKPFDEEELHRTTHFSSQAFLENIITLATFSGVIYR 239 

Query: 242 TVNFHYDTVNKLI 254 

F++D +N +1 
Sbjct: 240 REPFYFDDMNAI I 252 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 904 

A DNA sequence (GBSx0959) was identified in S.agalactiae <SEQ ID 2757> which encodes the amino 
acid sequence <SEQ ID 2758>. This protein is predicted to be CpsY protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.59 Transmembrane 260 - 276 ( 260 - 276) 

Final Results 

bacterial membrane --- Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9879> which encodes amino acid sequence <SEQ ID 9880> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2759> which encodes the amino acid 
sequence <SEQ ID 2760>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1958 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



WO 02/34771 



PCT/GB01/04789 



10 



-1003- 

Identities = 247/301 (82%) , Positives = 274/301 (90%) 

Query: 1 MRIQQLQWIKIVETGSMNEAAKQLYITQPSLSNaVKNLETEMGIQIFIRNPKGITLTKD 60 

MRIQQL Y+IKIVE GSMNEAAKQL+ITQPSLSNAV++LE EMGI IF RNPKGITLTKD 
Sbjct: 1 MRIQQLHYI IKIVECGSMNEAAKQLFITQPSLSNAVKDLEMEMGITI FNRNPKGITLTKD 60 

Query: 61 GMEFLSYARQILEQTALLEERYKGDNTSRELFSVSSQHYAFWNAFVALFNGTDMTQYEL 120 

G+EFLSYARQI +EQT+LLE+RYK NT RELFSVSSQHYAFWNAFV+L TDMT+YEL 
Sbjct: 61 GvEFLSYARQIIEQTSLLEDRYKNHNTGRELFSVSSQHYAFVVNAFVSLLKRTDMTRYEL 120 

Query: 121 FLRETRTWEIIDDVKNFRSEIGVLFLNSYNRDVLTKLFDDNSLIATTLFTTTPHIFVSKS 180 

FLRETRTWEI IDDVKNFRSEIGVLF+N YNRDVLTKLFDDN L A+ LF PHIFVSKS 
Sbjct: 121 FLRETRTWE 1 1 DDVKNFRSE IGVLFINDYNRD VLTKLFDDNHLTAS PLFKAQPHI FVSKS 180 

15 Query: 181 NPIANRKKLNMKDLEDYPYLSYDQGLHNSFYFSEEMMSQIPHPKSIWSDRATLFNLMIG 240 

NPLA + L+M DL D+PYLSYDQG+HNSFYFSEEMMSQ+PH KSIWSDRATLFNLMIG 
Sbjct: 181 NPLATKSLLSMDDLRDFPYLSYDQGIHNSFYFSEEMMSQMPHNKSIWSDRATLFNLMIG 240 

Query: 241 LDGYTVATGILNSKLNGDEIVAIPLDVDDVIDIVYIRHDKANLSKMGQKFIDYLLEEVSFN 301 
20 LDGYTVA+GILNS LNGD+IVAIPLDV D IDIV+I+H+KANLSKMG++FI+YLLEEV+F+ 

Sbjct: 241 LDGYTVASGILNSNLNGDQIVAIPLDVPDEIDIVFIKHEKANLSKMGERFIEYLLEEVTFD 301 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 905 

A DNA sequence (GBSx0960) was identified in S.agalactiae <SEQ ID 2761> which encodes the amino 
acid sequence <SEQ ID 2762>. This protein is predicted to be CpsX protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 32 
30 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.91 Transmembrane 22 - 38 ( 13 - 42) 
INTEGRAL Likelihood =-14.65 Transmembrane 52 - 68 ( 44 - 77) 
INTEGRAL Likelihood = -6.74 Transmembrane 76 - 92 ( 73 - 97) 

35 Final Results 

bacterial membrane Certainty=0 . 6965 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:AAC44935 GB:U56901 putative transcriptional regulator [Bacillus subtilis] 
Identities = 120/389 (30%) , Positives = 196/389 (49%) , Gaps = 17/389 (4%) 

Query: 2 KIGKKIvLMFTAIvLTTVLALGVYLTSAYTFSTGELSKTFKDFSTSSNKSDAIK-QTRAF 60 
45 KI K+I+L+F A+ L V+ LG Y + E + S+ +++ + + + F 

Sbjct: 19 KILKRIMLLF-ALALLVWGLGGYKLYKTINAADESYDALSRGNKSNLRNEVVDMKKKPF 77 

Query: 61 SILLMGVOTGSSERASKWEGNSDSMILvTvNPKTKKTIM'SLE 120 
SIL MG++ +++ +G SDS+I+VT++PK K M S+ RDT L+G + G 
50 Sbjct: 78 S ILFMGIEDYATKGQ KGRSDSLI WTLDPKNKTMKMLS I PRDTRVQLAG DTTG 130 

Query: 121 vEAKLNAAYAAGGAQMAIMWQDLWITIDNWQINMQGLIDLWAVGGITVTNEFDFPI 180 

+ K+NAAY+ GG + TV++ L I ID YV ++ G D++N VGGI V FDF 
Sbjct: 131 SKTKINAAYSKGGKDETvETVENFLQIPIDKYVTVDFDGFKDVINEVGGIDVDVPFDFDE 190 

55 

Query: 181 SIAENEPEYQATVAPGTHKINGEQALVYARMRYDDPEGDYGRQKRQREVIQKVLKKILAL 240 

+E + + G +NGE+AL YARMR D GD+GR RQ++++ ++ ++ + 

Sbjct: 191 KSDVTjESK-RIYFKKGE^LNGEEALAYARMRKQDKRGDFGRNDRQKQILNALIDRMSSA 249 

60 Query: 241 DSISSYRKILSAVSSNMQTNIEISSRTIPSLLGYRDALRTIKTYQLKGEDATLSDGGSYQ 300 

+1+ KI S N++TNI 1+ + +IT+GDL +Y 
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Sbjct: 250 SNIAKIDKIAEKASENVETNIRITEGLALQQIYSGFTSKKIDTLSITGSDLYLGPNNTYY 309 

Query: 301 IOTSIWLLEIQNRIRTELGLHKOTQLKTI^TVYENLYGSTKSQTVNNNYDSSGQAPSYS^ 360 

LE ++R LH++ +T TS + + + S+G + 

Sbjct: 310 FEPDATNLE KVRKTLQEH - LDYTPDTSTGTSGTEDGTDS S SS SGSTGSTGTTTDGTT 365 

Query: 361 SHSSYANYSSGVDTGQSASTDQDSTASSH 389 

+ SSY+N SS T + ST +T SS+ 
Sbjct: 366 NGSSYSNDSS TSSNNSTTNSTTDSSY 391 

There is also homology to SEQ ID 2764. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 906 

15 A DNA sequence (GBSx0961) was identified in S.agalactiae <SEQ ID 2765> which encodes the amino 
acid sequence <SEQ ID 2766>. This protein is predicted to be CpsIaB. Analysis of this protein sequence 
reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -0.75 Transmembrane 121 - 137 ( 121 - 137) 

Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9877> which encodes amino acid sequence <SEQ ID 9878> 
was also identified. 

No corresponding DNA sequence was identified in S.pyogenes. 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 907 

A DNA sequence (GBSx0962) was identified in S.agalactiae <SEQ ID 2767> which encodes the amino 
acid sequence <SEQ ID 2768>. This protein is predicted to be cpsb protein. Analysis of this protein 
35 sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.02 Transmembrane 182 - 198 ( 179 - 204) 
INTEGRAL Likelihood = -5.57 Transmembrane 30 - 46 ( 24 - 48) 

40 



45 



Final Results 

bacterial membrane Certainty=0 .4609 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 10785> and protein <SEQ ID 10786> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
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McG: Discrim Score: -8.96 
GvH: Signal Score (-7.5): 0.11 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -9.02 threshold: 0.0 

INTEGRAL Likelihood = -9.02 Transmembrane 182 - 198 ( 179 - 204) 
INTEGRAL Likelihood = -5.57 Transmembrane 30- 46 ( 24- 48) 
PERIPHERAL Likelihood = 6.21 113 
modified ALOM score: 2.30 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 46 09 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 908 

A DNA sequence (GBSx0963) was identified in S.agalactiae <SEQ ID 2769> which encodes the amino 
acid sequence <SEQ ID 2770>. This protein is predicted to be CpsIaD. Analysis of this protein sequence 
reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.44 Transmembrane 149 - 165 ( 149 - 166) 



Final Results 

bacterial membrane Certainty=0. 1977 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 909 

A DNA sequence (GBSx0964) was identified in S.agalactiae <SEQ ID 2771> which encodes the amino 
acid sequence <SEQ ID 2772>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.26 Transmembrane 276 - 292 ( 270 - 297) 

INTEGRAL Likelihood = -4.62 Transmembrane 10 - 26 ( 9 - 28) 

INTEGRAL Likelihood = -4.14 Transmembrane 41 - 57 ( 39 - 58) 

INTEGRAL Likelihood = -3.24 Transmembrane 100 - 116 ( 100 - 116) 

INTEGRAL Likelihood = -3.08 Transmembrane 445 - 461 ( 443 - 461) 



Final Results 

bacterial membrane Certainty=0. 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8687> and protein <SEQ ID 8688> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 5.69 
GvH: Signal Score (-7.5) : -5.63 
Possible site: 25 
5 »> Seems to have an uncleavable N-term signal seq 

ALOM program count: 5 value: -12.26 threshold: 0.0 

INTEGRAL Likelihood =-12.26 Transmembrane 276 - 292 ( 270 - 297) 
INTEGRAL Likelihood = -4.62 Transmembrane 10 - 26 ( 9 - 28) 
INTEGRAL Likelihood = -4.14 Transmembrane 41- 57 { 39- 58) 
10 INTEGRAL Likelihood = -3.24 Transmembrane 100 - 116 ( 100 - 116) 

INTEGRAL Likelihood = -3.08 Transmembrane 445 - 461 ( 443 - 461) 
PERIPHERAL Likelihood = 2.23 221 
modified ALOM score: 2.95 

15 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 5904 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 910 

25 A DNA sequence (GBSx0965) was identified in S.agalactiae <SEQ ID 2773> which encodes the amino 
acid sequence <SEQ ID 2774>. This protein is predicted to be CpsF. Analysis of this protein sequence 
reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 
30 INTEGRAL Likelihood = -2.60 Transmembrane 79 - 95 ( 78 - 95) 

Final Results 

bacterial membrane Certainty=0. 2041 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

40 Example 911 

A DNA sequence (GBSx0966) was identified in S.agalactiae <SEQ ID 2775> which encodes the amino 
acid sequence <SEQ ID 2776>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

Possible site: 39 
45 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4634 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could he useful antigens for 
vaccines or diagnostics. 

Example 912 

A DNA sequence (GBSx0967) was identified in S.agalactiae <SEQ ID 2777> which encodes the amino 
acid sequence <SEQ ID 2778>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




12 


47 


Transmembrane 


59 - 


75 


( 54 


- 82) 


INTEGRAL 


Likelihood 




10 


88 


Transmembrane 


309 - 


325 


( 307 


- 332) 


INTEGRAL 


Likelihood 




-8 


07 


Transmembrane 


33 - 


49 


( 28 


- 53) 


INTEGRAL 


Likelihood 




-6 


48 


Transmembrane 


195 - 


211 


( 187 


- 212) 


INTEGRAL 


Likelihood 




-6 


16 


Transmembrane 


285 - 


301 


( 283 


- 306) 


INTEGRAL 


Likelihood 




-4 


09 


Transmembrane 


222 - 


238 


( 221 


- 240) 


INTEGRAL 


Likelihood 




-3 


50 


Transmembrane 


78 - 


94 


( 77 


- 96) 


INTEGRAL 


Likelihood 




-2 


71 


Transmembrane 


101 - 


117 


( 99 


- 117) 


INTEGRAL 


Likelihood 




-2 


44 


Transmembrane 


8 - 


24 


( 7 


- 25) 


INTEGRAL 


Likelihood 




-1 


59 


Transmembrane 


147 - 


163 


( 147 


- 164) 


INTEGRAL 


Likelihood 




-0 


48 


Transmembrane 


168 - 


184 


( 168 


- 184) 



Final Results 

bacterial membrane Certainty=0 . 5989 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB43614 GB:AJ239004 polysaccharide polymerase [Streptococcus pneumoniae] 
Identities = 74/309 (23%), Positives = 137/309 (43%), Gaps = 36/309 (11%) 



Query: 


53 


FERRKLV- --II FLLFIATILNLFFVHKVTFILTLIFFIiALKDI - - SLKKAFSII IGSRI 


107 






FE+RK II ++ I T+L + ++ +F+ + I L++ II 




Sb j ct : 


61 


FEKRKYTLQFI ISI ILITTLLLYTSIQMQNYVYFTSWFMLIGTIHYDLRRVIKI IFIVS - 


119 


Query: 


108 


LGVLLNQIFVKLDLIEIKY VNFYRDGQFILRSDLGFGHPNFIHNFFALTIFLYIV 


162 






L ++ IF+ L + I Y +N R+ + + GF HPN + ++I 




Sb j ct : 


120 


LSIMFISIFISLLMYIIDYKREILINIRRN-ETVRAFTFGFIHPNKFTI VLSNLCLMFIW 


178 


Query: 


163 


LJ^KRLKPVvMvLFLTLNYLLYQYTFSRTGYYIVILFI VLIYVTKNSLIKRVFMKLAPYV 


222 






L RLK + L + Y +T +RT + 1+ L+Y+ ++ + ++ Y 




Sb j ct : 


179 


LIKDRLKYYHVTFCLFIQLFFYFFTQTRTALLVSIVIFALLYI- -YMFVENLELRWIGYS 


236 


Query: 


223 


QFFLLVFTFLSSTIFFNSN- -FVQKLDVLLTGRLHY-AHLQLVDGLTPFGNSFKE 


274 






F + F + + F+ SN F +D +LTGR+ A+ + G T +G + 




Sbjct: 


237 


FFCISTFLGVLAFQFYPSNNKFSIFIDNILTGRIKLAAYARTFFGYTFWGQYVDKEIVWD 


296 


Query: 


275 


TSVLFDNSYSMLLSMYG WLTMFCMI IY YIYSKKI 1 1 IELQLLLFIMS 1 1 


324 






TS FD+ YS L+S G++ + +++ Y+ +K +1+ LL + M + 




Sb j ct : 


297 


PIWGLTSFTFDSFYSFLMSNAGIIWLLILSVLFVKLQKYLDNKSLIL LLAWSMYAV 


352 


Query: 


325 


LFTESFYPS 333 








T+ +PS 




Sb j ct : 


353 


TETDLIFPS 361 





No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



PCT/GB01/04789 



-1008- 

Example 913 

A DNA sequence (GBSx0968) was identified in S.agalactiae <SEQ ID 2779> which encodes the amino 
acid sequence <SEQ ID 2780>. This protein is predicted to be cap8J. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 57 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3424 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB43613 GB:AJ239004 cap8J [Streptococcus pneumoniae] 
15 Identities = 94/237 (39%) , Positives = 135/237 (56%) , Gaps = 10/237 (4%) 

Query: 1 MIPKVIHYCWFGGNPLPDNLKKYIKTWREQCPDYEIIEWNEHNYDVSKNVFMREAYTKKN 60 

MIPK IHY WFGG+ PD + K I +W++ PDYEI+EWNE N+D+S + F + AY + 
Sbjct: 1 MIPKKIHYIWFGGSEKPDVVLKCINSWKKYMPDYEI VEWNEDNFDLSDSQFAKSAYESRK 60 

20 

Query: 61 FAYVSDYARLDIIYTYGGFYLDTDVELLKSL-DPLRIHECFIAREISCDVNTGLlIGAVK 119 

+A+ SDYAR 1+ YGG Y DTDVELLK++ D + H F E +VN GL+ + 
Sbjct: 61 WAFASDYARFKILSKYGGIYFDTDVELLKTISDDILAHSSFTGFEYIGEVNPGLVYACMP 120 

25 Query: 120 GHHFLKSNMSIYDKB--DLTSMKTCOTOTTNLLINRGLKNKNIIQKIDDITIYPRNYFN 177 

K + Y+++ D+ L T + T+ L+ + N Q ID + IYP +YF 
Sbjct: 121 DDKIAKYWQYYEQASFDINHL-VTVNTIITDYLLKNNFQKNNQFQIIDGLAIYPDDYFC 179 

Query: 178 PKNLLTGKVDCLTSVTYSIHHYEGSWKSSSFISDSLKIRVRLIIDFLFGYGTYRMLL 234 
30 + +V LT T SIHHY +WK+ +LK +V++I+ + G YR LL 

Sbjct: 180 GYDQEVKEVR-LTERTISIHHYSATWKTR TLKRKVQMIVKTI IGAENYRKLL 230 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 914 

A DNA sequence (GBSx0969) was identified in S.agalactiae <SEQ ID 2781> which encodes the amino 
acid sequence <SEQ ID 2782>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
40 »> Seems to have no N-terminal signal sequence 

Final Results -' 

bacterial cytoplasm Certainty=0. 3 897 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



50 



>GP:CAA87700 GB:Z47767 WbcL [Yersinia enterocolitica] 
Identities = 60/207 (28%) , Positives = 101/207 (47%) , Gaps = 22/207 (10%) 

Query: 4 IFTPTFNRGYRLSYLYDSLCNQTNKNFIWLIVDDGSEDSTKEIVSNYIKENKVSIVYLYK 63 

+FTPTFNR + L Y S+ Q + WLIVDDGS D+T E+V ++ ENK++I Y+Y+ 
Sbjct: 6 VFTPTFNRAHVLKRCYLSILEQDRDDIEWLIvDDGSTDNTAEVVDSFKIENKIjNIKYIYQ 65 



55 Query: 64 RNGGKHSAYNLAMRYMQPSDYHVCVDSDDWLLEDAV EIIFKDLESLTLSNRYVG 117 

N GK +A+N A+ +Y + +DSDD + ++ +F D E + + 
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55 



-1009- 

Sbjct: 66 DNSGKQAAWNKAVENAS-GEYFIGLDSDDAFIAGSINKLLSMNAVFDDKEIIGIR A 120 

Query: 118 LWPRYSIaNQGMTWLNPKILEWIPDLKYKyHLKIETCIVINNAYLVDFEFPCFEGENFL 177 

+ +L N +L+ + + + D ++ ++ E I) + +P G NF+ 

Sbjct: 121 ISVSSETLKPNNYYLSNEDKKSSWFD-EFSSGIRGERIDFFKTELLRKYLYPVASGINFI 179 

Query: 178 SEEIMYIYLSKKGYFCPQNRKIYCFDY 204 

E Y ++K+ YCF Y 

Sbjct: 180 PEIWFYSTVAKE YCFYY 196 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 915 

15 A DNA sequence (GBSx0970) was identified in S.agalactiae <SEQ ID 2783> which encodes the amino 
acid sequence <SEQ ID 2784>. This protein is predicted to be eps7. Analysis of this protein sequence 
reveals the following: 

Possible site: 32 

»> Seems to have an uncleavable N-term signal seq 
20 INTEGRAL Likelihood = -2.18 Transmembrane 190 - 206 ( 189 - 206) 

Final Results 

bacterial membrane Certainty=0 . 1871 (Affirmative) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB59293 GB:AJ131984 putative galactosyl transferase 
[Streptococcus pneumoniae] 
30 Identities = 101/312 (32%) , Positives = 172/312 (54%) , Gaps = 4/312 (1%) 

Query: 3 LISIIVPVYNGEIYIGRCLDSILEQTYQNLEIIIIDDGSSDRTGDICEKYFLEDRRIKYF 62 

+IS+IVPVYN Y+ LDS+LEQTY++ E+I+++DGS+D +G+IC++Y I F 

Sbjct: 1 MISVIVPvYNVADYLRFALDSLLEQTYKDFEVILVNDGSTDNSGEICDEYGKLYDNIHVF 60 

35 

Query: 63 YQENRGQSVARNNGVLRCTGDWIAFLDSDDVYLPYSIEVMYNIQKATNADIVLT--SIGN 120 

+++N G S ARN G+ + G++I FLDSDD + PY++E++ IQK + DIV T I 
Sbjct: 61 HKKNGGLSDARNFGLEKSRGEFITFLDSDDYFEPYALELLITIQKKYDVDIVSTKGGITY 120 

40 Query: 121 FNNTYNTSINSQYLKEIKLYTLEVALEEMYYGKTYGVSPLAKLYPRSNLLSNPYPEGKIH 180 

++ Y+ + ++ +K+ T + L +YY VS KLY R +L +P+GKI+ 

Sbjct: 121 SHDIYSKKLMAEDYLTVXILTNKEFIAAVYYNDEMTVSAWGKLYKR-DLFKTIFPKGKIY 179 

Query: 181 EDMDTTFKLISCASKIAVCDIVTAVVYFSDNSTTRTKFNERMLYFFEAIQNNIVFINIjNF 240 
45 ED+ + + +A D+ Y S + F++R FF+AI +N I + 

Sbjct: 180 EDLYWAERLLNIKTVAHTDLPIYHYYQRQGSIVNSTFSDRQYDFFDAIDHNEAIIKKFY 239 

Query: 241 PHNTSLISAVIYNEVFGGIDICGKMIDFKLYDTVDYYRKKYRKYFKTILFNNRISVKEKV 300 
+ L++A+ V G I + + + + + Y+ ++ N +1 +K KV 

50 Sbjct: 240 CGDKELl^AAIiNAKRVIGSF-ILSNSAFYNSKNDITKIIRIIKPYYWEVIKNKKIPMKRKV 298 



Query: 301 KYILFISS IRYF 312 

+ +LF+ S Y+ 
Sbjct: 299 QCVLFLLSPNYY 310 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 916 

A DNA sequence (GBSx0971) was identified in S.agalactiae <SEQ ID 2785> which encodes the amino 
acid sequence <SEQ ID 2786>. This protein is predicted to be galactosyltransferase. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 26 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2787> which encodes the amino acid 
sequence <SEQ ID 2788>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2065 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 37/111 (33%) , Positives = 61/111 (54%) , Gaps = 3/111 (2%) 

25 

Query: 1 MDKVSIIIPVyNVQSFLNECIESVLftQ-TYSNLEIILVNDGSTDNSGDIC-DYYSEIDGR 58 

M KVSII YN ++++ ++S L+Q T +EII+++D STD+S +1 Y + G+ 
Sbjct: 1 MYKVSIICTNYNKAPWISDALDSFLSQVTDFEVEIIVIDDASTDDSREIIjKSYQKKSSGK 60 

30 Query: 59 I-FVFHKNNGGLSDARNYG1SR&TGDYIYLLDSDDYLYKEDAIERMVEFSE 108 

I +F++ N G++ A G YI D DDY +++ V+ E 

Sbjct: 61 IKLLFNETNIGITKTWIKACLYAKGKYIARCDGDDYWTDSFKLQKQVDVLE 111 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 917 

A DNA sequence (GBSx0972) was identified in S.agalactiae <SEQ ID 2789> which encodes the amino 
acid sequence <SEQ ID 2790>. This protein is predicted to be CpsK. Analysis of this protein sequence 
reveals the following: 

40 Possible site: 52 

»> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 918 

A DNA sequence (GBSx0973) was identified in S.agalactiae <SEQ ID 2791> which encodes the amino 
acid sequence <SEQ ID 2792>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1956 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 919 

A DNA sequence (GBSx0974) was identified in S.agalactiae <SEQ ID 2793> which encodes the amino 

acid sequence <SEQ ID 2794>. This protein is predicted to be capsular polysaccharide. Analysis of this 

protein sequence reveals the following: 

20 Possible site: 36 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

35 bacterial membrane Certainty=0 .4524 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 920 

A DNA sequence (GBSx0975) was identified in S.agalactiae <SEQ ID 2795> which encodes the amino 
acid sequence <SEQ ID 2796>. This protein is predicted to be NeuB. Analysis of this protein sequence 
45 reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

50 bacterial cytoplasm Certainty=0. 2992 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 921 

A DNA sequence (GBSx0976) was identified in S.agalactiae <SEQ ID 2797> which encodes the amino 
acid sequence <SEQ ID 2798>. This protein is predicted to be NeuC. Analysis of this protein sequence 
reveals the following: 

10 Possible site: 41 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3150 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 922 

A DNA sequence (GBSx0977) was identified in S.agalactiae <SEQ ID 2799> which encodes the amino 
acid sequence <SEQ ID 2800>. This protein is predicted to be neuD. Analysis of this protein sequence 
reveals the following: 

25 Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

There is homology to SEQ ID 542. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 923 

A DNA sequence (GBSx0979) was identified in S.agalactiae <SEQ ID 2801> which encodes the amino 
acid sequence <SEQ ID 2802>. Analysis of this protein sequence reveals the following: 

Possible site: 33 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2576 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 



-1013- 



PCT/GB01/04789 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 924 

A DNA sequence (GBSx0980) was identified in S.agalactiae <SEQ ID 2803> which encodes the amino 
acid sequence <SEQ ID 2804>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1621 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9875> which encodes amino acid sequence <SEQ ID 9876> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2805> which encodes the amino acid 
sequence <SEQ ID 2806>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1066 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 83/139 (59%) , Positives = 111/139 (79%) 

Query: 6 TETHDHQALIQKLLVSIHYLTLFRDEIILVEKTPSLLGKHFSIAIVQNELGEILSKIEAL 65 

TE + HQ LIQKLLVSIHYLTLFRDE+ LVE+TPS+LG F +VQ+ELG+I++ 1+ L 
Sbjct: 4 TEQNSHQILIQKLLVSIHYLTLFRDELKLVERTPSILGGEFPAHbVQSELGDIVAAIDTL 63 

Query: 66 SKQKZLIRSIYWYDESSFKVMNKALAIVEEWIKGLDNLLEFCQSQTVFQAILGDERAHVF 125 

Q++LI S +WY+ES+FK+MNK L IV+ WIKG+D+L++ CQS+ VFQ I+GD+R VF 
Sbjct: 64 DMQQRLIESTFWYEESAFKLMNKTLDI VDNWI KGVDHLIDLCQSKEVFQI I IGDKRIRVF 123 

Query: 126 GILID VYTSLNI INTSLKE 144 

G+L DV++SL + SLKE 
Sbjct: 124 GVLSDVFSSLKVSALSLKE 142 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 925 

A DNA sequence (GBSx0981) was identified in S.agalactiae <SEQ ID 2807> which encodes the amino 
acid sequence <SEQ ID 2808>. This protein is predicted to be uracil-DNA glycosylase (ung). Analysis of 
this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3427 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2809> which encodes the amino acid 
sequence <SEQ ID 2810>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 160/216 (74%) , Positives = 185/216 (85%) 



Query: 


1 


MKHSSWHDLIKRELPNHYYNKINTFMDAVYESGIVYPPRDKVFNAIQITPLKNVKVVIIG 


60 






M HS WH+ IK LP HYY +IN F+D Y SG+VYPPR+ VF A+Q+TPLE KV+I+G 




Sbjct: 


1 


MAHSIWHEKIKSFLPEHYYGRINHFLDEAYASGLVYPPRENVFKALQVTPLEETKVLILG 


60 


Query: 


61 


QDPYHGPQQAQGLSFSVPDNLPAPPSLQNILKELAEDIGSRSHHDLTSWAQQGVLLLNAC 


120 






QDPYHGP+QAQGLSFSVP+ + APPSL NILKELA+DIG R HHDL++WA QGVLLLNAC 




Sb j ct : 


61 


QDPYHGPKQAQGLSFSVPEEISAPPSLINILKELADDIGPRDHHDLSTWASQGVLLLNAC 


120 


Query: 


121 


LTVPEHQANGHAGLIWEPFTDAVIKWNQKETPWFILWGGYARKKKSLIDNPIHHIIES 


180 






LTVP QANGHAGLIWEPFTDAVIKV+N+K++PWFILWG YARKKK+ I NP HHIIES 




Sbjct: 


121 


LWPAGQANGHAGLIWEPFTDAVIKVLNEKDSPWFILWGAYARKKKAFITNPKHHIIES 


180 


Query: 


181 


PHPSPLSAYRGFFGSRPFSRTNHFLEEEGIMEIDWL 216 








PHPSPLS+YRGFFGS+PFSRTN LE+EG+ +DWL 




Sbjct: 


181 


PHPSPLSSYRGFFGSKPFSRTNAILEKEGMTGVDWL 216 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



A DNA sequence (GBSx0982) was identified in S.agalactiae <SEQ ID 281 1> which encodes the amino 
acid sequence <SEQ ID 2812>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have an uncleavable N-terra signal seq 
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Final Results 



bacterial cytoplasm Certainty=0 .4200 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Example 926 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5458 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9873> which encodes amino acid sequence <SEQ ID 9874> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91549 GB:Z67739 unidentified [Streptococcus pneumoniae] 
Identities = 134/212 (63%) , Positives = 168/212 (79%) 



Query: 1 MNIIIMIIIAYLLGSIQTGLWlGKYFyQVNLRQHGSGNTGTTNTFRILGVKAGIVTLTID 60 
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M I+++I+AYLLGSI +GLWIG+ F+Q+NI^+HGSGNTGTTNTFRILG KAG+ T ID 
Sbjct: 1 MITIVLLILAYLLGSIPSGLWIGQVFFQI^REHGSGOTGTTlWFRILGKKaGMATFVID 60 

Query: 61 ILKGTLATLIPIILGITTVSPFFIGFFAIIGHTFPIFAQFKGGKAVATSAGVLLGFAPSF 120 

KGTLATL+PII + VSP G A+ IGHTFPI FA FKGGKAVATSAGV+ GFAP F 
Sbjct: 61 FFKGTLATLLPIIFHLQGVSPLIFGLLAVIGHTFPIFAGFKGGKAVATSAGVIFGFAPIF 120 

Query: 121 FLYLLVIFLLTLYLFSMISLSSITVAWGILSVLIFPLVGFILTDYDWIFTTWILMALT 180 

LYL +IF LYL SMISLSS+T ++ ++ VL+FPL GFIL++YD++F +++ +A 
Sbjct: 121 CLYLAI I FFGALYLGSMI SLSSVTAS IAAVIG VLLFPLFGFILSNYDFLF I AI I LALASL 180 

Query: 181 IIIRHQDNIKRIRKRQENLVPFGLNLSKQKNK 212 

IIIRH+DNI RI+ + ENLVP+GLNL+ Q K 
Sbjct: 181 1 1 IRHKDNIARI KNKTENLVPWGLNLTHQDPK 212 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2813> which encodes the amino 
sequence <SEQ ID 2814>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0. 5331 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAA91549 GB:Z67739 unidentified [Streptococcus pneumoniae] 
Identities = 138/213 (64%) , Positives = 166/213 (77%) 

Query: 28 MKLLLFITIAYLLGSIPTGLWIGQYFYHINLREHGSGNTGTTNTFRILGVKAGTATLAID 87 

M ++ + +AYLLGSIP+GLWIGQ F+ INLREHGSGNTGTTNTFRILG KAG AT ID 
Sbjct: 1 MITIVLLILAYLLGSIPSGLWIGQVFFQINLREHGSGNTGTTNTFRILGKKAGMATFVID 60 

Query: 88 MFKGTLS I LLP I I FGMTS I SS IAIGFFAVLGHTFPI FANFKGGKAVATSAGVLLGFAPLY 147 

FKGTL+ LLPIIF + +S + G AV+GHTFPI FA FKGGKAVATSAGV+ GFAP++ 
Sbjct: 61 FFKGTLATLLPIIFHLQGVSPLIFGLLAVIGHTFPIFAGFKGGKAVATSAGVIFGFAPIF 120 

Query: 148 LFFLASIFVLVLYLFSMISLASWSAIVGVLSVLTFPAIHFLLPNYDYFLTFIVILLAFI 207 

+LA IF LYL SMISL+SV ++I V+ VL FP F+L NYD+ I++ LA + 
Sbjct: 121 CLYLAIIFFGALYLGSMISLSSVTASIAAV-IGVLLFPLFGFILSNYDFLFIAIILALASL 180 

Query: 208 IIIRHKDNISRIKHHTENLIPWGLNLSKQVPKK 240 

IIIRHKDNI+RIK+ TENL+PWGLNL+ Q PKK 
Sbjct: 181 1 1 IRHKDNIARIKNKTENLVPWGLNLTHQDPKK 213 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 143/212 (67%) , Positives = 174/212 (81%) 

Query: 1 MNI I IMI I IAYLLGSIQTGLWIGKYFYQVNLRQHGSGNTGTTNTFRILGVKAGIVTLTID 60 

M +++ I IAYLLGSI TGLWIG+YFY +NLR+HGSGNTGTTNTFRILGVKAG TL ID 
Sbjct: 28 MKLLLFITIAYLLGSIPTGLWIGQYFYHIIJIjREHGSGNTGTTNTFRILGVKAGTATLAID 87 

Query: 61 ILKGTLATLIPIILGITTVSPFFIGFFAIIGHTFPIFAQFKGGKAVATSAGVLLGFAPSF 120 

+ KGTL+ L+PII G+T++S IGFFA++GHTFPIFA FKGGKAVATSAGVLLGFAP + 
Sbjct: 88 MFKGTLSILLPIIFGMTSISSIAIGFFAVLGHTFPIFAKFKGGKAVATSAGVLLGFAPLY 147 

Query: 121 FLYLLVIFLLTLYLFSMISLSSITVAVVGILSVLIFPLVGFILTDYDWIFTTWILMALT 180 

+L IF+L LYLFSMISL+S+ A+VG+LSVL FP + F+L +YD+ T +VIL+A 
Sbjct: 148 LFFLASIFVLVLYLFSMISLASWSAIVGVLSVLTFPAIHFLLPNYDYFLTFIVILLAFI 207 
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Query: 181 IIIRHQDNIKRIRKRQENLVPFGLNLSKQKNK 212 

IIIRH+DNI RI+ ENL+P+GLNLSKQ K 
Sbjct: 208 1 1 IRHKDNI SRI KHHTENLI PWGLNLSKQVPK 239 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 927 

A DNA sequence (GBSx0983) was identified in S.agalactiae <SEQ ID 2815> which encodes the amino 
10 acid sequence <SEQ ID 2816>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 928 

A DNA sequence (GBSx0984) was identified in S.agalactiae <SEQ ID 2817> which encodes the amino 
25 acid sequence <SEQ ID 2818>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0 . 1585 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9871> which encodes amino acid sequence <SEQ ID 9872> 
35 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA91550 GB:Z67739 DNA topoisomerase IV [Streptococcus pneumoniae] (ver 2) 
Identities = 574/649 (88%) , Positives = 617/649 (94%) , Gaps = 2/649 (0%) 

40 Query: 5 LAKQDITvTNyGDDAIQVLEGLDATOKRPG^IGSTDGTGLHHLvWEIVDNAVDEALSGF 64 

++K++I + NY DDAIQVLEGLDAVRKRPGMYIGSTDG GLHHLVWEIVDNAVDEALSGF 
Sbjct: 1 MSKKEININNYNDDAIQVLEGLDATOKRPGMYIGSTDGAGLHHLVWEIVDNAVDEALSGF 60 

Query: 65 GNRIDVIINKDGSITVTDHGRGMPTGMHftMGKPTVEVIFTVLHAGGKFGC<3GyKTSGGLH 124 
45 G+RIDV INKDGS+TV DHGRGMPTGMHAMG PTVEVIFT+LHAGGKFGQGGYKTSGGLH 

Sbjct: 61 GDRIDVTINKDGSLWQDHGRGMPTGMHAMGIPTvEVIFTILHAGGKFGQGGYKTSGGLH 120 

Query: 125 GVGSSWNALSSWLEVEIIRDGAIYRQRFENGGKPVTTLKKIGTAPKSKSGTSVSFMPDQ 184 
GVGSSWNALSSWLEVEI RDGA+Y+QRFENGGKPVTTLKKIGTAPKSK+GT V+FMPD 
50 Sbjct: 121 GVGSSVVNALSSWLEVEITRDGAvYKQRFFJStGGKPOTTLK^IGTAPKSKTGTKVTFMPDA 180 

Query: 185 SVFSTIDFKFNTIAERLKESAFLLKNVTLTLTDNRSEEAEHLEFHYENGVQDFVEYLNED 244 
++FST DFK+NTI +ERL ESAFLLKNVTL+LTD R++EA +EFHYENGVQDFV YLNED 
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Sbjct: 181 TIFSTTDFKYOTISERLNESAFLLKNVTLSLTOKRTDEA- - IEFHYENGVQDFVSYLNED 238 

Query: 245 KETLTPIMFFEGEEQEFHIEVMiQYNDGFSDNILSFVNNVRTKDGGTHETGLKSAITKSM 304 

KE LTP+++FEGE+ F +EVALQYNDGFSDNILSFVMNVRTKDGGTHETGLKSAITK M 
Sbjct: 239 KEILTPVLYFEGEDNGFQVEVALQVNTOFSDNILSFVMIWRTKDGGTHETGLKSAITK™ 298 

Query: 305 NDYARKTGLLKEKDKNLEGSDYREGLSAILS ILVPEEHLQFEGQTKDKLGS PLARPIVDG 364 

NDYARKTGLLKEKDKNLEGSDYREGL+A+LSILVPEEHLQFEGQTKDKLGSPLARP+VDG 
Sbjct: 299 NDYARKTGLLKEKDKNLEGSDYREGLAAVLS ILVPEEHLQFEGQTKDKLGS PLARPWDG 358 

Query: 365 IVSEKLTYFLMENGDIASNLIRKAIKARDAREAARKARDESRNGIQCSKKDKGLLSGKLTP 424 

IV++KLT+FLMENG+IiASNLIRKAIKARDAREAARKARDESRNGKK+KKDKGLLSGKLTP 
Sbjct: 359 IVADKLTFFLMENGEIASNLIRKAIKARDAREAARKARDESRNGKKNKKDKGLLSGKLTP 418 

15 Query: 425 AQSKNAKKlffiLYLVEGDSAGGSAKQGRDRKFQAILPLRGKAn^ 484 

AQSKN KNELYLVEGDSAGGSAKQGRDRKFQAILPLRGKV+NTAKAKMADI +KNEEINT 
'Sbjct: 419 AQSKNPAKNELYLVEGDSAGGSAKQGRDRKFQAILPLRGKVINTAKAKMADILKNEEINT 478 

Query: 485 M1HTIGAGVGPDFNLDDINYDKIIIMTDADTDGAHIQTLLLTFFYRYMRPLVEEGHVYIA 544 
20 MI+TIGAGVG DF+++D NYDKI I IMTDADTDGAHIQTLLLTFFYRYMRPLVE GHVYIA 

Sbjct: 479 MIYTIGAGVGADFSIEDANYDKIIIMTDADTDGAHIQTLLLTFFYRYMRPLVEAGHVYIA 538 

Query: 545 LPPLYKMSKGKGKKEIVEYAWTDIELEELRQKFGKGSLLQRYKGLGEMNADQLWETTMNP 504 
LPPLYKMSKGKGKKE V YAWTD ELEELR+ + FGKG+ LQRYKGLGEMNADQLWETTMMP 
25 Sbjct: 539 LPPLYKMSKGKGKKEEVAYAWTDGELEELRKQFGKGATLQRYKGLGEMNADQLWETTMNP 598 

Query: 605 ETRTL I RVT I EDLARAERRVNVLMGDKVPPRRQWIEDNVKFTLEENTVF 653 

ETRTLIRVTIEDLARAERRVNVLMGDKV PRR+WIEDNVKFTLEE TVF 
Sbjct: 599 ETRTLIROTIEDLARAERRVNVLMGDKVEPRRKWIEDNVKFTLEEATVF 647 



30 



35 



40 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2819> which encodes the amino acid 
sequence <SEQ ID 2820>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1518 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 560/649 (86%), Positives = 615/649 (94%) 

Query: 5 IJ^QDITvTNYGDDAIQvLEGLDAVRKRPGMYIGSTDGTGLHHLVWEIVDNAVDEALSGF 64 
45 L K++IT+ NY DDAIQVLEGLDAVRKRPGMYIGSTD TGLHHL+WE I VDNAVDEALSGF 

Sbjct: 2 LTKKEITINNYNDDAIQVLEGLDAvRKRPGMYIGSTDATGLHHLIWEIVDNAVDEALSGF 61 

Query: 65 GNRIDVIINKDGSITVTDHGRGMPTGMHAMGKPTVEVIFTVLHAGGKFGQGGYKTSGGLH 124 
G+ I V+INKDGS++V D GRGMPTG HAMG PTV+VIFT+LHAGGKFGQGGYKTSGGLH 
50 Sbjct: 62 GDDIKWINKDGSVSVADSGRGMPTGQHAMGIPTVQVIFTILHAGGKFGQGGYKTSGGLH 121 

Query: 125 GVGSSVVNALSSWLEVEIIRDGAIYRQRFENGGKPVTTLKKIGTAPKSKSGTSVSFMPDQ 184 

GVGSSWNALS+WLEVEI RDG++YRQRFENGGKPVTTLKK+GTAPKSKSGT V+FMPD 
Sbjct: 122 GVGSSVVNALSAWLEVEITRDGSvYRQRFENGGKPVTTLKKVGTAPKSKSGTVVTFMPDD 181 

55 

Query: 185 SVFSTIDFKFNTIAERLKESAFLLKNVTLTLTDNRSEEAEHLEFHYENGVQDFvEYLNED 244 

+FSTIDFKFNTI+ERLKESAFLLKNV ++LTD R ++ EFHYENGVQDFVEYLNED 
Sbjct: 182 KIFSTIDFKFNTISERLKESAFLLKNVKMSLTDLRGDDPIIEEFHYENGVQDFVEYLNED 241 

60 Query: 245 KETLTPIMFFEGEEQEFHIEVALQY1TOGFSDNILSFV1JNVRTKDGGTHETGLKSAITKSM 304 

KETLTP+++ EG++Q+F +EVALQYNDGFSDNILSFVNNVRTKDGG+HETGLKSAITK+M 
Sbjct: 242 KETLTPVIYMEGQDQDFQvEVALQYNDGFSDNILSFVNNVRTKDGGSHETGLKSAITKAM 301 

Query: 305 NDYARKTGLLKEKDKNLEGSDYREGLSAILSILVPEEHLQFEGQTKDKLGSPLARPIVDG 364 
65 NDYARKT LLKEKDKNLEGSDYREGLSA+LSILVPE+HLQFEGQTKDKLGSPLARPIV+ 
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Sbjct: 


302 


1TOYARKTNLLKEKDKNLEGSDYREGLSAVLSILVPEQHLQFEGQTKDKLGSPLARPIVES 


361 


Query: 


365 


TtrCBVT rpvCT HJOMfriT 7A CMT TDVH TT/"7\T5T"\7\T5'n'7V T\ DVT\Bn , DC'n'MnL r T/'C"VTfn'K'PT .T CPTfT rpri 

IVSEKLTYrLMEISlbDJ^b^ 


424 






IVSEKLT+FL+ENG++AS+L+RKAIKA^ 




Sbjct: 


362 


IVSEKLTFFLLENGEVASHLWKAIKARD^^ 


421 


Query: 


425 


AQSKNAKKJSTELYLVEGDSAGGsAKQGRDRKFQAILPIiRGKvXi^^ 


484 






AQSKNAKKNELYLVEGDSAGGSAKQGRDRKFQAILPLRGKVLNT KAKMADI+KNEEINT 




Sbjct: 


422 


AQSKNAKKIffiLYLVEGDSAGGSAKQGRDRKFQAILPLRGKAnOTEKAKmDILKNEEINT 


481 


Query: 


485 


MIHTIGAGVGPDFNLDDINYDKI I IMTDADTDGAHIQTLLLTFFYRYMRPLVEEGHVYIA 


544 






M++TIGAGVG DFNL+DINYDKIIIMTDADTDGAHIQTLLLTFFYRYMRPLVE GHVYIA 




Sb j ct : 


482 


MVYTIGAGVGADFNLEDINYDKI I IMTDADTDGAHIQTLLLTFFYRYMRPLVEAGHVYIA 


541 


Query: 


545 


LPPLYKMSKGKGKKEIVEYAWTDIELEELRQKFGKGSLLQRYKGLGEMNADQLWETTMNP 


604 










Sb j ct : 


542 


LPPLYKMSKGKGKTEKIAYAWTDGELEDLRREFGKGAILQRYKGLGEMNANQLWETTMDP 


601 


Query: 


605 


ETRTLIRVTIEDLARAERRVNVLMGDKVPPRRQWIEDNVKFTLEENTVF 653 








ETRTLIRVTI+DLARAERRV+VLMGDK PRRQWIEDNVKFTLEENTVF 




Sb j ct : 


602 


ETRTLIRvTIDDLARAERRVSvLMGDKAAPRRQWIEDNVKFTLEENTVF 650 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 929 

A DNA sequence (GBSx0985) was identified in S.agalactiae <SEQ ID 2821 > which encodes the amino 
acid sequence <SEQ ID 2822>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 378 - 394 ( 378 - 394) 

Final Results 

bacterial membrane Certainty=0. 1319 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD34369 GB:AF129764 ParC [Streptococcus mitis] 
Identities = 640/820 (78%) , Positives = 722/820 (88%) , Gaps = 5/820 (0%) 

Query: 1 MSNIQNMSLEDIMGERFGRYSKYI IQERALPD IRDGLKPVQRRILYSMNKDGNTFEKGFR 60 

MSNIQNMSLEDIMGERFGRYSKYI IQ+RALPDIRDGLKPVQRRILYSMNKDGNTF+K +R 
Sbjct: 1 MSNIQNMSLEDIMGERFGRYSKYI IQDRALPDIRDGLKPVQRRILYSMNKDGNTFDKSYR 60 

Query: 61 KSAKSVGNVMGNFHPHGDSSIYDAMVRMSQDWKNRETLIEMHGNNGSMDGDPAAAMRYTE 120 

KSAKSVGN+MGNFHPHGDSS I YDAMVRMSQDWKNRE L+EMHGNNGSMDGDP AAMRYTE 
Sbjct: 61 KSAKSVGNIMGNFHPHGDSSIYDAMVRMSQDWKNREILVEMHGNNGSMDGDPPAAMRYTE 120 

Query: 121 ARLSEIAGYLLQDIDKNTVPFAWNFDDTEKEPTVLPAAFPNLLVNGATGISAGYATDIPP 180 

ARLSEIAGYLLQDIDK TVPF+WNFDDTEKEPTVLPAAFPNLLVNG+TGISAGYATDIPP 
Sbjct: 121 ARLSEIAGYLLQDIDKKIVPFSWNFDDTEKEPTVLPAAFPNLLVNGSTGISAGYATDIPP 180 

Query: 181 HNLAEVIDAVVYMIDHPKAKLDKLMEFLPGPDFPTGAIIQGKDEIRKAYETGKGRVAVRS 240 

HNLAEVIDA VYMIDHP AK+DKLMEFLPGPDFPTG IIQG+DEI+KAYETGKGRV VRS 
Sbjct: 181 HNLAEVIDAAVYMIDHPTAKVDKLMEFLPGPDFPTGGIIQGRDEIKICAYETGKGR'VVVRS 240 

Query: 241 RTAIETLKGGKKQIIvTEIPYEVNKSVLVKRIDDTOvNNKVPGIAEVRDESDRDGLRIAI 300 

+T IE LKGGK+QI++TEIPYE+NK+ LVK+IDDVRVN+KV GIAEVRDESDRDGLRIAI 
Sbjct: 241 KTEIEKLKGGKEQIVITEIPYEINKANLvKKIDDTOWSKVAGIAEVRDESDRDGLRIAI 300 



Query: 301 ELKKEADETIvIjNYLFKYTDLQVNYNFNMVAIDDYTPKQVGLSRILTSYIAHRREIIIAR 360 
ELKK+A+ +VLNYLFKYTDLQ+NYNFNMVAID++TP+QVG+ IL+SYIAHRRE+I+AR 
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Sbjct: 301 ELKKDJOTTELVl^LFKXTDLQINYNFIS^ 360 

Query: 361 SKFDKEKMKRLHIVEGLIRVLSILDEVIMiIRASENKMAKENLKVSYEFSEAQAEAIV 420 

S+FDKEKAEKRLHIVEGLIRV+SILDEVIALIRASENKADAKENLKVSY+F+E QAEAIV 
Sbjct: 361 SRFDKEKAEKRLHIVEGLIRVISILDEVIALIRASENKaDAKENLKVSYDFTEEQAEAIV 420 



Query: 421 TLQLyRLTNTDIVTI)REEEEELRQQITOIIjKAIISDERTWYlWMKRELREVK!G<:FANTRRS 480 

TLQLYRLTNTD+V L+EEE ELR++I ML All DERTMYN+MK+ELREVKKKFA R S 
Sbjct: 421 TLQLYRLTIWDVVVLQEEEAELREKIAMLAAIIGDERTMYNLMKKELREVKKKFATPRLS 480 



Query: 481 ELQELAETIEIDTASLIIEEDTYVSVTRGGYVKRTSPRSFNASTVDELGKREDDELIFVS 540 

L++ A+ IEIDTASLI EEDTYVSVT+ GY+KRTSPRSF AST++E+GKR+DD L1FV 

Sbjct: 481 SLEDTAKAIEIDTASLIAEEDTYVSVTKAGYIKRTSPRSFAASTLEEIGKRDDDRLIFVQ 540 

Query: 541 NAKTTQHLLMFIl^GNIAYRPVHELADIRWKDVGEHLSQNLTOFASNEEIIYAELVDDF- 599 

+AKTTQHLLMFT LGN+ YRP+HELADIRWKD+GEHLSQ + NF +NEEI+Y E+VD F 

Sbjct: 541 SAKTTQHLLMFTTLGNVIYRPIHELADIRWKDIGEHLSQTITNFETNEEILYVEWDQFD 600 



Query: 600 TKETYFAVTSLGQIKRFERQEISPWRTYKSKTAKYAKLKSVEDYWTVAPIQLEDVILVT 659 

TYFA T LGQIKR ER+E +PWRTYKSK+ KYAKLK D +V VAPI+L+DV+L++ 
Sbjct: 601 DATTYFAATRLGQIKRVERKEFTPWRTYKSKSVKYAKLKDDTDQIVAVAPIKLDDVLLIS 660 

Query: 660 YNGYALRFSIIffiVPWGSKAAGVKAMNLKDRDHIVSAFIANTTSLYLLTHRGSLKRMAID 719 

NGYALRF+I +VPWG+KAAGVKAMNLK+ D + SAFI NT+S YLLT RGSLKR++ID 
Sbjct: 661 QNGYALRFNIEEVPWGAKAAGVKAMNLKEDDTLQSAFICNTSSFYLLTQRGSLKRVSID 720 

Query: 720 VIPTTSRANRGLQVLRELKSKPHRVFKAGPVYLEDSSFEFDLFSSVSNHEGDTFVLEIMS 779 

IP TSRA RGLQVLRELK+KPHRVF AG V + F DLFS+ T L + S 

Sbjct: 721 EIPATSRAKRGLQVLRELKNKPHRVFLAGSV- -AEQGFVGDLFSTEVEENDQT- -LLVQS 776 

Query: 780 KTGKVYDVDLSQWSFSERTSNGSPVSDKISDEEVFSVKIK 819 

G +Y+ L + SERTSNGSF+SD ISDEEVF ' +K 
Sbjct: 777 NKGTIYESRLQDLNLSERTSNGSFISDTISDEEVFDAYLK 816 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2823> which encodes the amino acid 
sequence <SEQ ID 2824>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.53 Transmembrane 376 - 392 ( 376 - 394) 



Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 633/819 (77%) , Positives = 719/819 (87%) 



Query: 


1 


MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKGFR 


60 






MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKG+R 




Sbjct: 


3 


MSNIQNMSLEDIMGERFGRYSKYIIQERALPDIRDGLKPVQRRILYSMNKDGNTFEKGYR 


62 


Query: 


61 


KSAKSVGNVMGNFHPHGDSSIYDAMVRMSQDWKNRETLIEMHGNNGSMDGDPAAAMRYTE 


120 






KSAKSVGN+MGNFHPHGDSSIYDAMVRMSQDWKWRE L+EMHGNNGSMDGDP AAMRYTE 




Sbjct: 


63 


KSAKSVGNIMGNFHPHGDSSIYDAMVRMSQDWKNREILVEMHGNNGSMDGDPPAAMRYTE 


122 


Query: 


121 


ARLSEIAGYLLQDIDKt>TrVPFAWNFDDTEKEPTVLPAAFPNLLVNGATGISAGYATDIPP 


180 






ARLSEIAGYLLQDI+KNTV FAWNFDDTEKEPTVLPAAFPNLLVNG++GISAGYATDIPP 




Sbjct: 


123 


ARLSEIAGYLLQDIEKNTVSFAWNFDDTEKEPTVLPAAFPNLLVNGSSGISAGYATDIPP 


182 


Query: 


181 


HNLAEVIDAVVYMIDHPKAKLDKLMEFLPGPDFPTGAIIC<3KDEIRKAYETGKGRVAVRS 


240 






HNL+EVIDAWYMIDHPKA L+KLMEFLPGPDFPTG IIQG DEI+KAYETGKGRV VRS 




Sb j ct : 


183 


H^SEVIDAvVYMIDHPKASLEKLMEFLPGPDFPTGGIIQaADEIKKAYETGKGRVVVRS 


242 


Query: 


241 


RTAIETLKGGKKQIIVTEIPYEVNKSVLVTCRIDDVRVNNKVPGIAEVRDESDRDGLRIAI 


300 
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RT IE LKGGK+QIIVTEIPYEVNK+VLVK+IDDVRVNNKVPGI EVRDESDR GLRIAI 



Sb j ct : 


243 


RTEIEELKGGKQQIIVTEIPYEVNKAVLVKKIDDVRVNNKVPGIVEVRDESDRTGLRIAI 


302 


Query: 


301 


ELKKEADETI VI^LFKYTDLQVNyNEKIMVAIDDYTPKQVGLSRILTSYIAHRREI I IAR 


360 






ELKKEAD +IJSIYL KYTDLQVNYNFNMVAID +TP+QVGL +IL+SYI+HR++III R 




Sb j ct : 


303 


ELKKEADSQTIIiNYLLKYTDLQVNYNETSFMVAIDHFTPRQVGLQKILSSYISHRKDIIIER 


362 


Query: 


361 


SKFDKEKAEKRLHIVEGLIRVLSILDEVIALIRASENKADAKENLKVSYEFSEAQAEAIV 


420 






SKFDK KAEKRLHIVEGLIRVLSILDE+IALIR+S+NKADAKENLKVSY+FSE QAEAIV 




Sb j ct : 


363 


SKFDKAKAEKRLHIVEGLIRVLSILDEIIALIRSSDNKADAKENLKVSYDFSEEQAEAIV 


422 


Query: 


421 


TLQLYRLTlSrrDIVTLREEEEELRQQITMLKAIISDERTMYNVMKRELREVKKKFANTRRS 


480 






TLQLYRLTNTDIVTL+ EE +LR IT L All DE TMYNVMKRELREVKKKFAN R S 




Sb j ct : 


423 


TLQLYRLTNTDIVTLQNEENDLRDLITTLSAIIGDEATMYNVMKRELREVKKKFANPRLS 


482 


Query: 


481 


ELQEIiAETIEIDTASLIIEEDTYVSVTRGGYVKRTSPRSFNASTVDELGKREDDELIFVS 


540 






ELQ ++ IEIDTASLI EE+T+VSVTRGGY+KRTSPRSFNAS+++E+GKR+DDELIFV 




Sb j Ct : 


483 


ELQAESQI IEIDTASLIAEEETFVSVTRGGYLKRTSPRSFNASSLEEVGKRDDDELI FVK 


542 


Query: 


541 


NAKTTQHLLMFTNLGNLAYRPVHELADIRWKDVGEHLSQNLVNFASNEEIIYAELVDDFT 


600 






AKTT+HLL+FT LGN+ YRP+HEL D+RWKD+GEHLSQ + NFA+ EEI+YA++V F 




Sbjct : 


543 


OAKTTEHLLLFTTLGNVIYRPIHELTDLRWKDIGEHLSOTISNFATEEEILYADIVTSFD 


602 


Query: 


601 


KETYFAVTSLGQI KRFERQE I S PWRTYKSKTAKYAKLKSVEDYWTVAPIQLEDVI LVTY 


660 






+ Y AVT G IKRF+R+E+SPWRTYKSK+ KY KLK +D WT++P+ +ED++LVT 




Sbj ct : 


603 


QGLYVAOTQNGFIKRFDRKELSPWRTYKSKSTKYVKLKDDKDRVVTLSPVIMEDLLLVTK 


662 


Query: 


661 


NGYALRFSINDVPWGSKAAGVKAiWLKDRDHIVSAFIANTTSLYIiLTHRGSLKRMAIDV 


720 






NGYALRFS +VP+ G K+AGVK +NLK+ D + SAF + S ++LT RGSLKRMA+D 




Sbj ct : 


663 


NGYALRFSSQEVPIQGLKSAGVKGINLKNDDSLASAFAVTSNSFFVLTQRGSLKRMAVDD 


722 


Query: 


721 


IPTTSRANRGLQVLRELKSKPHRVFKAGPVYLEDSSFEFDLFSSVSNHEGDTFVLEIMSK 


780 






IP TSRANRGL VLRELK+KPHRVF AG V + S+ +FDLF+ + E + +LE++SK 




Sbjct: 


723 


IPQTSRANRGLLVLRELKTKPHRVFIAGGVQSDTSAEQFDLFTDIPEEETNQQMLEVISK 


782 


Query: 


781 


TGKVYDVDLSQWSFSERTSNGSFVSDKISDEEVFSVKIK 819 








TG+ Y++ L S SER SNGSF+SD ISD+EV + + 




Sbj ct : 


783 


TGQTYEIALETLSLSERISNGSFISDTISDQEVLVARTR 821 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 930 

A DNA sequence (GBSx0986) was identified in S.agalactiae <SEQ ID 2825> which encodes the amino 
acid sequence <SEQ ID 2826>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3369 (Affirmative) < suco ■ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF64593 GB:AF169649 branched- chain aminotransferase IlvE 
[Lactococcus lactis] 
Identities = 259/340 (76%) , Positives = 294/340 (86%) 

Query: 1 MTvNLDWDNLGFAYRKLPFRYISHFKDGKWDDGKLTDDATLHISESSPALHYGQQAFEGL 60 

M +NLDW+NLGF+YR LPFRYI+ FKDGKW G+LT D LHISESSPALHYGQQ FEGL 
Sbjct: 1 MAINLDWENLGFSYRNLPFRYIARFKDGKWSAGEIiTGDNQLHISESSPALHYGQQGFEGL 60 

Query: 61 KAYRTKIX3SIQLFRPDQNAERLQRTADRLLMPHVPTDKFIAAWSVVRANEEFVPPYGTG 120 
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10 



15 



Sbj ct: 


61 


Query: 


121 


Sbj ct : 


121 


Query: 


181 


Sbj ct : 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 



KAYRTKDGS IQLFRPDQNA RLQ+TA RL M V T+ FI AVK W+AN++FVPPYGTG 



ATLY+RPLLIGVGD+ IGVKPA+EYI F VFAMPVGSYFKGGL P+ F++S+EYDRA&P GT 



G AKVGGNYAASL A ++D IYLDP+THTKIEEVGAANFFGIT DN+FITPLS 



PSILPSITKYSLLYLA+ R G++AIEG+V+ +L KF EAGACGTAA+ISPIG I +G+D 



++F+SETEVGP ++LYDELVGIQFGDVEAPEGWI KVD 
3YIFHSETEVGPTVKRLYDELVGIQFGDVEAPEGWIVKVD 340 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 2827> which encodes the amino acid 
sequence <SEQ ID 2828>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 1208 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



30 An alignment of the GAS and GBS proteins is shown below. 

Identities = 280/340 (82%) , Positives = 308/340 (90%) 





Query: 


1 


MTVNLDWDNLGFAYRKLPFRYISHFKDGKWDDGKLTDDATLHISESSPALHYGQQAFEGL 


60 








MT+ +DWDNLGF Y KLPFRYIS++K+G+WD G+LT+DATLHISES+PALHYGQQAFEGL 




35 


Sbj ct : 


16 


MTIAIDWDNLGFEYHKLPFRYISYYKNGQWDKGQLTEDATLHISESAPALHYGQQAFEGL 


75 




Query: 


61 


KAYRTKDGSIQLFRPDQNAERLQRTADRLLMPHVPTDKFIAAVKSWRANEEFVPPYGTG 


120 








KAYRTKDGSIQLFRPD+NA RLQ TADRLLMP V T++FI A K W+ANE+FVPPYGTG 




40 


Sbj ct : 


76 


KAYRTKDGSIQLFRPDRNAVRLQATADRLLMPQVSTEQFIDAAKQWKANEDFVPPYGTG 


135 




Query: 


121 


ATLYIRPLLIGVGDIIGVKPAEEYIFTVFAMPVGSYFKGGLTPTNFIVSKEYDRAAPNGT 


180 








ATLY+RPLLIGVGDIIGVKPAEEYIFT+FAMPVG+YFKGGL PTNFIVS+ +DRAAP GT 






Sbjct: 


136 


ATLYLRPLLIGVGDIIGVKPAEEYIFTIFAMPVGNYFKGGLAPTNFIVSEAFDRAAPYGT 


195 


45 


Query: 


181 


GAAKVGGNYAASLLPGKYAHEKQFSDVIYLDPATHTKIEEVGAANFFGITKDNQFITPLS 


240 








GAAKVGGNYA SLLPGK A FSDVIYLDPATHTKIEEVGAANFFGIT +N+F+TPLS 






Sbjct: 


196 


GAAKVGGNYAGSLLPGKARKSAGFSDVIYLDPATHTKIEEVGAANFFGITANNEFVT'PLS 


255 




Query: 


241 


PSILPSITKYSLLYIAKERFGMEA1EGDVFVDELDKFTEAGACGTAAVISPIGGIQNGDD 


300 


50 






PSILPSITKYSLL LA+ER GM IEGDV ++ELDKF EAGACGTAAVISPIGGIQ D+ 






Sbj ct : 


256 


PSILPSITKYSLLQLAEERLGMTVIEGDVPINEIiDKFVEAGACGTAAVISPIGGIQYKDN 


315 




Query: 


301 


FHVFYSETEVGPATRKLYDELVGIQFGDVEAPEGWIYKVD 340 










HVFYSETEVGP ' TR+LYDELVGIQFGD+EAPEGWI KVD 




55 


Sbj ct : 


316 


LHVFYSETEVGPVTRRLYDELVGIQFGDIEAPEGWIVKVD 355 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 931 

60 A DNA sequence (GBSx0987) was identified in S.agalactiae <SEQ ID 2829> which encodes the amino 
acid sequence <SEQ ID 2830>. Analysis of this protein sequence reveals the following: 
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Possible site: 30 

:>» Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 3459 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9365> which encodes amino acid sequence <SEQ ID 9366> 
10 was also identified. A further related GBS nucleic acid sequence <SEQ ID 10915> which encodes amino 
acid sequence <SEQ ID 10916> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 283 1> which encodes the amino acid 
sequence <SEQ ID 2832>. Analysis of this protein sequence reveals the following: 

15 Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3043 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 22/36 (61%) , Positives = 30/36 (83%) 

25 

Query: 4 IVSKKDKKIEIQISDAQVTVNGTKVDGYQLVMEKKL 39 

++SKKDKKIEIQ+ D +V VN TK+DGYQL + K++ 
Sbjct: 1 VMSKKDKKIEIQLIDHKVMVNETKIDGYQLQIGKRV 36 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 932 

A DNA sequence (GBSx0988) was identified in S.agalactiae <SEQ ID 2833> which encodes the amino 
acid sequence <SEQ ID 2834>. This protein is predicted to be glycyl-tRNA synthetase beta subunit (glyS). 
35 Analysis of this protein sequence reveals the following: 

Possible site: 14 

>» Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 . 1617 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 >GP:CAB73488 GB:AL139077 glycyl-tRNA synthetase beta chain 

[Campylobacter jejuni] 
Identities = 33/90 (36%) , Positives = 49/90 (53%) , Gaps = 2/90 (2%) 

Query: 3 RAFNIAEKVTHSVI,vDSSLFENNQEKALYCAILSLELTEDIffiDNLDKLFALSPIINDFFD 62 
50 R N+A K H V D SLF E LY+A + + L+ LFAL P I++FF+ 

Sbjct: 570 RLANIATKNPHKV--DESLFVQFAESKLYKAFQEKTKANSLQEKI J ENLFALKPFIDEFFN 627 

Query: 63 NTMvMTDDEKMKQNRLAILNSLVAKARTvA 92 
M+ +DEK+K NR A++ + A+ +A 
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Sbjct: 628 QVMINAEDEKLKNNRQALVYEIYAEFLK1A 657 

There is also homology to SEQ ID 2836. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 933 

A DNA sequence (GBSx0989) was identified in S.agalactiae <SEQ ID 2837> which encodes the amino 
acid sequence <SEQ ID 2838>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4825 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13672 GB:Z99113 ynzC [Bacillus subtilis] 
Identities = 41/72 (56%) , Positives = 56/72 (76%) 

20 

Query: 5 KIARINELSKKKKTVGLTGEEKVEQAKLREEYIEGFRRSTOHHVEGIKLVDDEGNDVTPE 64 

KIARINEL+ K K +T EEK EQ KLR+EY++GFR S+++ ++ +K++D EGNDVTPE 
Sbjct: 6 KIARINEI^KAKAGVITEEEKAEQQKLRQEYLKGFRSSMKNTLKSVKIIDPEGNDVTPE 65 

25 Query: 65 KLRQVQREKGLH 76 

KL++ QR LH 
Sbjct: 66 KLKREQRNNKLH 77 

A related DNA sequence was identified in S.pyogenes <SEQ ID 283 9> which encodes the amino acid 
30 sequence <SEQ ID 2840>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm — Certainty=0 .4303 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 79/85 (92%) , Positives = 83/85 (96%) 

Query: 1 MDPKKIARINELSKKKKWGLTGEEKVEQAKLREEYIEGFRRSVRHHVEGIKLVDDEGND 60 

MDPKKIARINEL+KKKKTVGLTG EKVEQAKLREEYIEG+RRSVRHH+EGIKLVD+EGND 
Sbjct: 1 MDPKKIARINE1AKKKKTVGLTGPEKVEQAKLREEYIEGYRRSVRHHIEGIKLVDEEGND 60 

45 

Query: 61 VTPEKLRQVQREKGLHGRSLDDPNS 85 

VTPEKLRQVQREKGLHGRSLDDP S 
Sbjct: 61 VTPEKLRQVQREKGLHGRSLDDPKS 85 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 934 

A DNA sequence (GBSx0990) was identified in S.agalactiae <SEQ ID 2841> which encodes the amino 
acid sequence <SEQ ID 2842>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
5 >>> Seems to have no M- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2343 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB69985 GB:U94355 glycerol kinase [Enterococcus casselif lavus] 
Identities = 381/496 (76%) , Positives = 439/496 (87%) 

15 





Query: 


3 


SEEKYIMAIDQGTTSSRAIIFNKKGEKIASSQKEFPQIFPQAGWVEHNANQIWNSVQSVI 


62 








+E+ Y+MAIDQGTTSSRAIIF++ G+KI SSQKEFPQ FP++GWVEHNAN+IWNSVQSVI 






Sb j ct : 


2 


AEKNYvMAIDQGTTSSFAIIFDRNGKXIGSSQKEFPQYFPKSGWVEHNANEIWNSVQSVI 


61 


20 


Query: 


63 


AGAFIESSIKPGQIEAIGITNQRETTWWDKKTGLPIYNAIVWQSRQTAPIADQLKQEGH 


122 








AGAFIES I+P I IGITNQRETTWWDK TG PI NAIVWQSRQ++PIADQLK +GH 






Sb j ct : 


62 


AGAFIESGIRPFAIAGIGITNQRETTWWDKTTGQPIANAIVWQSRQSSPIADQLKVDGH 


121 




Query: 


123 


TNMIHEKTGLVIDAYFSATKVRWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDGLVHV 


182 


25 






T MIHEKTGLVIDAYFSATKVRW+LD++ GAQE+A+ GELLFGTID+WLVWKLTDG VHV 






Sbjct: 


122 


TEMIHEKTGLVIDAYFSATKVRWLLDNIEGAQEKADNGELLFGTIDSWLVWKLTDGQVHV 


181 




Query: 


183 


TDYSNAARTMLYNIKELKWDDEILELLNIPKAMLPEVKSNSEVYGKTTPFHFYGGEVPIS 


242 








TDYSNA+RTMLYNI +L+WD EIL+LLNIP +MLPEVKSNSEVYG T +HFYG EVPI+ 




30 


Sbj ct : 


182 


TDYSNASRTMLYNIHKLEWDQEILDLLNIPSSMLPEVKSNSEVYGHTRSYHFYGSEVPIA 


241 




Query: 


243 


G^GDQQAALFGQIAFEPGMVKNTYGTGSFIIMNTGEEMQLSQNNLLTTIGYGINGKVHY 


302 








GMAGDQQAALFGQ+AFE GM+KNTYGTG+FI+MNTGEE QLS N+LLTTIGYGINGKV+Y 




35 


Sbj ct : 


242 


G^GDQQAALFGQMAFEKGMIKNTYGTGAFIVMNTGEEPQLSDNDLLTTIGYGINGKVYY 


301 




Query: 


303 


ADEGSIFIAGSAIQWLRDGLRMIETSSESEGIAQSSTSDDEVYWPAFTGLGAPYWDSNA 


362 








ALEGSIF+AGSAIQWLRDGLRMIETS +SE LA + D+EVYWPAFTGLGAPYWDS A 






Sbjct: 


302 


ALEGS I FVAGSAIQWLRDGLRMIETSPQSEELAAKAKGDNE VYVVPAFTGLGAPYWDSEA 


361 


40 


Query: 


363 


RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTMQVDSGIDIQQLRVDGGAAMNNLLMQ 


422 








RG+VFGLTRGT+KEDFV+ATLQ++AYQ +DVIDTM+ DSGIDI L+VDGGAA N+LLMQ 






Sbj ct : 


362 


RGAVFGLTRGTTKEDFVRATLQAVAYQSKDVIDTMKKDSGIDIPLLKVDGGAAKNDLLMQ 


421 


45 


Query: 


423 


FQADILGIDIAFAKNLETTALGAAFIAGLSVGYWESMDELKELNATGQLFQATMNESRKE 


482 






FQADIL ID+ RA NLETTALGAA+IAGL+VG+W+ +DELK + GQ+F M ++ 






Sbjct: 


422 


FQADILDIDVQRAANLETTALGAAYLAGLAVGFWiaDLDELKSMAEEGQMFTPEMPAEERD 


481 




Query: 


483 


KLYKGWRKAVKATQVF 498 










LY+GW++AV ATQ F 




50 


Sbjct: 


482 


NLYEGWKQAVAATQTF 497 





A related DNA sequence was identified in S.pyogenes <SEQ ID 2843> which encodes the amino acid 
sequence <SEQ ID 2844>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
55 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2282 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

60 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 464/500 (92%) , Positives = 484/500 (96%) 



Query: 


3 


SEEKYIMAIDQGTTSSRAIIFNKKGEKIASSQKEFPQIFPQAGWWHNANQIWNSVQSVI 


62 






S+EKYIMAIDQGTTSSRAI I FN+KGEK++SSQKEFPQI FP AGWVEHNANQIWNSVQSVI 




Sbjct: 


2 


SQEKYIMAIDQGTTSSRAI I FNQKGEKVSSSQKEFPQIFPHAGWVEHNANQIWNSVQSVI 


61 


Query: 


63 


AGAFIESSIKPGQIEAIGITNQRETTWWDKRTGLPIYNAIWQSRQTAPIADQLKQEGH 


122 






AGAFIESSIKP QIEAIGITNQRETTWWDKKTG+PIYNAIVWQSRQTAPIA+QLKQ+GH 




Sbjct: 


62 


AGAFIESSIKPSQIEAIGITNQRETTVVWDKfCTGVPIYNAIVWQSRQTAPIAEQLKQDGH 


121 


Query: 


123 


TNMIHEKTGLVIDAYFSATKVRWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDGLVHV 


182 






T MIHEKTGLVIDAyFSATK+RWILDHVPGAQERAEKGELLFGTIDTWliWKLTDG VHV 




Sbjct: 


122 


TKMIHEKTGLVIDAYFSATKIRWILDHVPGAQERAEKGELLFGTIDTWLVWKLTDGAVHV 


181 


Query: 


183 


TDYSNAARTMLYNIKELKWDDEILELLNIPKAMLPEVKSNSEVYGKTTPFHFYGGEVPIS 


242 






TDYSNAARTMLYNIK+L WDDEILELLNIPK MLPEVKSNSE+YGKT FHFYGGEVPIS 




Sb j ct : 


182 


TDYSNAARTMLYNIKDLTWDDEILELLNIPKDMLPEVKSNSEIYGKTAAFHFYGGEVPIS 


241 


Query: 


243 


GMAGDQQAALFGQIAFEPG^KNTYGTGSFIIMNTGEEMQLSQNNLLTTIGYGINGKVHY 


302 






GMAGDQQAALFGQLAFEPGMVKNTYGTGS FI IMNTG+EMQLS NNLLTTIGYGINGKVHY 




Sbjct: 


242 


GMAGDQQAALFGQIAFEPGMVKNTYGTGSFIIMNTGDEMQLSSNNLLTTIGYGINGKVHY 


301 


Query: 


303 


ALEGSIFIAGSAIQWLRDGLRMIETSSESEGLAQSSTSDDEVYWPAFTGLGAPYWDSNA 


362 






ALEGSIFIAGSAIQWLRDGL+MIETS ESE A +STSDDEVYWPAFTGLGAPYWDSNA 




Sb j ct : 


302 


ALEGSIFIAGSAIQWbRDGIiKMIETSPESEQFALASTSDDEVYWPAFTGLGAPYWDSNA 


361 


Query: 


363 


RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTMQVDSGIDIQQLRVDGGAAMNNLLMQ 


422 






RGSVFGLTRGTSKEDFVKATLQSIAYQVRDVIDTMQVDSGIDIQQLRVDGGAAMNN+LMQ 




Sb j ct : 


362 


RGSWGLTRGTSKEDWKATLQSIAYQVRDVIDTMQVDSGIDIQQLRVT3GGAAMNNMLMQ 


421 


Query: 


423 


FQADILGIDIARAKNLETTALGAAFLAGLSVGYWESMDELKEIJ^ATGQLFQATMNESRKE 


482 






FQADILGIDIARAKNLETTALGAAFLAGL+VGYWE MD LKEIiNATGQLF+A+MNESRKE 




Sb j ct : 


422 


FQADILGIDIARAKI^ETTALGAAFLAGLAVGYWEDMDALKELNATGQLFKASMNESRKE 


481 


Query: 


483 


KLYKGWRKAVKATQVFAQED 502 








KLYKGW++AVKATQVF QE+ 




Sbjct: 


482 


KLYKGWKRAVKATQVFTQEE 501 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 935 

A DNA sequence (GBSx0992) was identified in S.agalactiae <SEQ ID 2845> which encodes the amino 
acid sequence <SEQ ID 2846>. Analysis of this protein sequence reveals the following: 

Possible site : 14 

>>> Seems to have no N-tertninal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3146 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GEKPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 936 

A DNA sequence (GBSx0993) was identified in S.agalactiae <SEQ ID 2847> which encodes the amino 
acid sequence <SEQ ID 2848>. This protein is predicted to be alpha-glycerophosphate oxidase (glpD). 
Analysis of this protein sequence reveals the following: 

Possible site: 40 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.81 Transmembrane 20 - 36 ( 20 - 36) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC34740 GB:U94770 alpha-glycerophosphate oxidase [Streptococcus pneumoniae] 
Identities = 464/608 {76%) , Positives = 539/608 (88%) 

Query: 1 MEFSRETRRLALQRMQDRTLDLLIIGGGITGAGVALQAAASGLDTGLIEMQDFAEGTSSR 60 

MEFS++TR L++++MQ+RTLDLLIIGGGITGAGVALQAAASGL+TGLIEMQDFAEGTSSR 
Sbjct: 1 MEFSKKTRELS I KKMQERTLDLLI IGGGITGAGVALQAAASGLETGLIEMQDFAEGTSSR 60 

Query: 61 STKLVHGGLRYLKQFDVEWSDWSERAWQQIAPHIPKPDPMLLPVYDEPGSTFSMFRL 120 

STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDE G+TFS+FRL 
Sbjct: 61 STKLVHGGLRYLKQFDVEWSDTVSERAWQQIAPHIPKPDPMLLPVYDEDGATFSLFRL 120 

Query: 121 KVaTTOLYDLLAGVTNTPAaNKVL^ 180 

KVAMDLYDLLAGV+NTP ANKVLS + VL+R+P+L+KEGL+GGGVYLDFRNNDARLVIEN 
Sbjct: 121 KVAMDLYDLIiAGVSNTPTANKVLSKDQVLERQPl^KKEGLVGGGVYLDFRNNDARLVIEN 180 

Query: 181 IKIUU>IRDG&YIASHVKAEDFLFDDNNQIIGVRARDLLTDQ^ 240 

IKRAN+DGA IA+HVKAE FLFD++ +1 GV ARDLLTDQV + 1 KARLVINTTGPWSD V 
Sbjct: 181 IKRANQDGALIANHVKAEGFLFDESGKITGWARDLLTDQVFEIKARLVINTTGPWSDKV 240 

Query: 241 RNFSNEGKQIHQLRPTKGVHLVVDRQKIjNISQPVYVDTGLNDGRMIFVLPREDKTYFGTT 300 

RN SN+G Q Q+RPTKGVHLWD K+ +SQPVY DTGL DGRM+FVLPRE+KTYFGTT 
Sbjct: 241 RNLSNKGTQFSQMRPTKGVHLVVDSSKIKVSQPVYFDTGLGDGRMVFVLPRENKTYFGTT 300 

Query: 301 DTDYHGDLEHPTVTKEDVDYLLNIVNKRFPEAELTIDDIESSWAGLRPLLSGNSASDYNG 360 

DTDY GDLEHP VT+EDVDYLL IVN RFPE+ +TIDDIESSWAGLRPL++GNSASDYNG 
Sbjct: 301 DTDYTGDLEHPKVTQEDVDYLLGIVNNRFPESNITIDDIESSWAGLRPLIAGNSASDYNG 360 

Query: 361 GNSGKLSDESFEELIDSVKDYIAHKNHREDVEKAISHVESSTSEKELDPSAVSRGSSFER 420 

GN+G +SDESF+ LI +V+ Y++ + REDVE A+S +ESSTSEK LDPSAVSRGSS +R 
Sbjct: 361 GNNGTISDESFDNLIATVESYLSKEKTREDVESAVSKLESSTSEKHLDPSAVSRGSSLDR 420 

Query: 421 DDNGLLTLAGGKITDYRKMAEGAMETIINILDKEYNRKFKLINSKTYPVSGGEINPSNVD 480 

DDNGLLTLAGGKITDYRKMAEGAME +++IL E++R FKLINSKTYPVSGGE+NP+NVD 
Sbjct: 421 DDNGLLTLAGGKITDYRKMAEGAMERWDILKAEFDRSFKLINSKTYPVSGGELNPANVD 480 

Query: 481 SEIEAYAQLGTLSGLSIEDARYIANLYGSNAPKLFALTRQITEAEGLSLVETLSLHYAMD 540 

SEIEA+AQLG GL ++A Y+ANLYGSNAPK+FAL + +A GLSL +TLSLHYAM 
Sbjct: 481 SEIEAFAQLGVSRGLDSKEAHYLANLYGSNAPKVFALAHSLEQAPGLSLADTLSLHYAMR 540 

Query: 541 YEMALSPTDFFLFJ^TNHMLF^^NI^SLIQPVIDEMAKHYQWSDQDKTFYEEELHETLKD 600 

E+ALSP DF LRRTNHMLFMRD+LDS+++PV+DEM + Y W++++K Y ++ L + 
Sbjct: 541 NELALS PVDFLLRRTNHMLFMRDSLDS I VEPVLDEMGRFYDWTEEEKATYRADVEAALAN 600 

Query: 601 NDLAALKD 608 

NDLA LK+ 
Sbjct: 601 NDLAELKN 608 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 1723 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



There is also homology to SEQ ID 128. 
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SEQ ID 2848 (GBS93) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 7; MW 70.6kDa). 

GBS93-His was purified as shown in Figure 192, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 937 

A DNA sequence (GBSx0994) was identified in S.agalactiae <SEQ ID 2849> which encodes the amino 
acid sequence <SEQ ID 285 0>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0965 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 938 

A DNA sequence (GBSx0995) was identified in S.agalactiae <SEQ ID 285 1> which encodes the amino 
acid sequence <SEQ ID 2852>. This protein is predicted to be glycerol uptake facilitator protein (glpF). 
Analysis of this protein sequence reveals the following: 

25 Possible site: 55 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 220 - 236 ( 216 - 236) 

INTEGRAL Likelihood = -6.48 Transmembrane 139 - 155 ( 136 - 158) 

INTEGRAL Likelihood = -3.88 Transmembrane 87 - 103 ( 83 - 107) 

30 INTEGRAL Likelihood = -3.03 Transmembrane 164 - 180 ( 162 - 183) 

Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8689> which encodes amino acid sequence <SEQ ID 8690> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 5 
40 SRCFLG: 0 

McG: Length of UR: 21 

Peak Value of UR: 2.51 
Net Charge of CR: -2 
McG: Discrim Score: 4.43 
45 GvH: Signal Score (-7.5): -0.139999 

Possible site: 50 
>» Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 51 
ALOM program count: 4 value: -7.43 threshold: 0.0 
50 INTEGRAL Likelihood = -7.43 Transmembrane 215 - 231 ( 211 - 231) 
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INTEGRAL Likelihood = -6.48 Transmembrane 134 - 150 ( 131 - 153) 

INTEGRAL Likelihood = -3.88 Transmembrane 82 - 98 ( 78 - 102) 

INTEGRAL Likelihood = -3.03 Transmembrane 159 - 175 ( 157 - 178) 
PERIPHERAL Likelihood =4.98 65 
5 modified ALOM score: 1.99 
icml HYPID: 7 CFP: 0.397 

*** Reasoning Step: 3 

10 Final Results 

bacterial membrane Certainty=0 . 3972 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA91618 GB:U12567 glycerol uptake facilitator [Streptococcus pneumoniae] 
Identities = 150/230 (65%) , Positives = 194/230 (84%) , Gaps = 1/230 (0%) 

Query: 7 DIFGEFLGTALLVLLGNGWAGWLPKTKNHNSGWIVITFGWGLAVAIAALVSGNISPAH 66 
20 ++FGEFLGT +L+LLGNGWAGWLPKTK+++SGWIVIT G+AVA+A VSG +SPAH 

Sbjct: 4 ELFGEFLGTLILILLGNGWAGWLPKTKSNSSGWIVITMV-GIAVAVAVFVSGKLSPAH 62 

Query: 67 LNPAVSLAFAIKGDLAWGTAILYMIAQIIGAMLGSLLVYLQFRPHYEAAENRADILGTFA 126 
LNPAV++ A+KG L W + + Y++AQ GAMLG +LV+LQF+PHYEA EN +IL TF+ 
25 Sbjct: 63 mPAVTIGVALKGGLPWASVLPYILAQFAGAMLGQILVWLQFKPHYEAEENAGNILATFS 122 

Query: 127 TGPALKDNFSNFLSEVLGTLVLVLTIFAIGKYNMPPGVGTMSVGMLWGIGLSLGGTTGY 186 

TGPA+KD SN +SE+LGT VLVLTIFA+G Y+ G+GT +VG L+VGIGLSLGGTTGY 
Sbjct: 123 TGPAIKDTVSNLISEILGTFVLVLTIFALGLYDFQAGIGTFAVGTLIVGIGLSLGGTTGY 182 

30 

Query: 187 AINPARDFGPRLLHALLPMKNKGDSDWTYSWIPIVGPMVGAILAALIFAM 236 

A+NPARD GPR++H++LP+ NKGD DW+Y+WIP+VGP++GA LA L+F++ 
Sbjct: 183 ALNPARDLGPRIMHSILPIPNKGDGDWSYAWIPVVGPVTGAALAVLVFSL 232 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 2853> which encodes the amino acid 

sequence <SEQ ID 2854>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
>» Seems to have a cleavable N-term signal seg. 

INTEGRAL Likelihood = -9.13 Transmembrane 213 - 229 ( 209 - 232) 

40 INTEGRAL Likelihood = -5.52 Transmembrane 137 - 153 ( 132 - 157) 

INTEGRAL Likelihood = -4.35 Transmembrane 159 - 175 ( 155 - 178) 

INTEGRAL Likelihood = -1.17 Transmembrane 85 - 101 ( 85 - 101) 

Final Results 

45 bacterial membrane Certainty=0. 4652 (Affirmative) < suco ' 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

50 >GP:AAA91618 GB:U12567 glycerol uptake facilitator [Streptococcus pneumoniae] 

Identities = 159/230 (69%) , Positives = 196/230 (85%) , Gaps = 1/230 (0%) 

Query: 2 DIFGEFLGTALLVLLGNGWAGWLPKTKrHASGWIVIATGWGIAVAVAVFISGKVAPAH 61 
++FGEFLGT +L+LLGNGWAGWLPKTK+++SGWIVT T GIAVAVAVF+SGK++PAH 
55 Sbjct: 4 ELFGEFLGTLILILLGNGWAGWLPKTKSNSSGWIVI-TMVGIAVAVAVFVSGKLSPAH 62 

Query: 62 LNPAVSLAFAMSGTIAWSTAIAYSIAQLLGAMVGSTLVFLQFRPHYLAAESQADILGTFA 121 

LNPAV++ A+ G + W++ + Y LAQ GAM+G LV+LQF+PHY A E+ +IL TF+ 1 
Sbjct: 63 I^PAVTIGVALKGGLPWASVLPYIIjAQFAGAMLGQILVWLQFKPHYEAEENAGNILATFS 122 



60 



Query: 122 TGPAIRDTSSNLLSEIF.GTFVLMLGILAFGLYDMPAGLGTLCVGTLVIGIGLSLGGTTGY 181 

TGPAI+DT SNL+SEI GTFVL+L I A GLYD AG+GT VGTL++GIGLSLGGTTGY 
Sbjct: 123 TGPAIKDTVSNLISEILGTFVLVLTIFALGLYDFQAGIGTFAVGTLIVGIGLSLGGTTGY 182 
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Query: 182 AINPARDLGPRLVHAILPI^KGDSDWSYAWIPWGPIIGAVIAVLLFQV 231 

A+NPARDLGPR+ +H+ ILP+ NKGD DWSYAWIPWGP+IGA LAVL+F + 
Sbjct: 183 ALNPARDLGPRIMHSILPIPNKGDGDWSYAWIPWGPVIGAALAVLVFSL 232 

5 An alignment of the GAS and GBS proteins is shown below. 

Identities = 169/232 (72%) , Positives = 202/232 (86%) 

Query: 6 MDI FGEFLGTALL VXiLGNGWAGVVLPKTKNHNSGWIVITFGWGLAVAIAALVSGNI SPA 65 
MDIFGEFLGTALLVLLGNGWAGWLPKTK H SGWIVI GWG+AVA+A +SG ++PA 
10 Sbjct: 1 MDIFGEFLGTALLVLLGNGWAGVvLPKTramSGWIVrATGWGIAVAVAVFISGKVAPA 60 

Query: 66 HI^PAVSIAFAIKGDIAWGTAILYMIAQIIGAMLGSLLVYLQFRPHyEAAENRADILGTF 125 

HLNPAVSLAFA+ G +AW TAI Y +AQ++GAM+GS LV+LQFRPHY AAE++ADILGTF 
Sbjct: 61 HLNPAVSLAFAMSGTIAWSTAIAYSLAQLLGAMVGSTLVFLQFRPHyiiAAESQADILGTF 120 

15 

Query: 126 ATGPALKDNFSNFLSEVLGTLVLVLTIFAIGKYNMPPGVGTMSVGMLWGIGLSLGGTTG 185 

ATGPA++D SN LSE+ GT VL+L I A G Y+MP G+GT+ VG LV+GIGLSLGGTTG 
Sbjct: 121 ATGPAIRDTSSNLLSEIFGTFVLMLGILAFGLYDMPAGLGTLCVGTLVIGIGLSLGGTTG 180 

20 1 Query: 186 YAINPARDFGPRLLHALLPMKNKGDSDWTYSWIPIVGPMVGAILAALIFAMM 237 
YAINPARD GPRL+HA+LP+ NKGDSDW+Y+WIP+VGP++GA+LA L+F +M 
Sbjct: 181 YAINPARDLGPRLVHAILPLNNKGDSDWSYAWIPWGPIIGAVLAVLLFQVM 232 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 939 

A DNA sequence (GBSx0996) was identified in S.agalactiae <SEQ ID 2855> which encodes the amino 
acid sequence <SEQ ID 2856>. This protein is predicted to be NADH oxidase. Analysis of this protein 
sequence reveals the following: 

30 Possible site: 23 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -2.87 Transmembrane 152 - 168 ( 152 - 168) 

Final Results 

35 bacterial membrane Certainty=0 .2147 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9523> which encodes amino acid sequence <SEQ ID 9524> 
40 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA48728 GB:X68847 NADH oxidase [Enterococcus faecalis] 
Identities = 105/423 (24%) , Positives = 197/423 (45%) , Gaps = 15/423 (3%) 

45 Query: 10 IVILGASFAGMTCAQKLRQLNPNWDIVLIDKEIHPDYVPNGIjNWYYRHEISGLNQAMWQT 69 

+V++G + AG + + + +P ++ + ++ + ++ G+ Y + + 
Sbjct: 3 WWGCTHAGTSAVKSIIjANHPEAEVTVYERNDNI^^ 62 

Query: 70 EEEQRLQNIRCLFGLKVEKINKEDR ELMLSDGSSVYYDQLICAMGSQAESTYIDG 124 

50 EE VE+IN +D+ L +V YD+L+ GS I G 

Sbjct: 63 PEELASLGATvKMEHNVEEINVDDKTVTAKNLQTGATETVSYDKLVMTTGSWPIIPPIPG 122 

Query: 125 ADAQGVLTTKTYATSQNAKQVLDKSHKmWGAGIIGLDIAYSLHESGKAVTLLEAQERP 184 
DA+ +L K Y+ + + + +V WG G IG+++ + ESGK VTL++ +R 
55 Sbjct: 123 IDAENILLCKNYSQAOTIIEKAKDAKRVVvVGGGYIGIELVEAFVESGKQVTLVDGLDRI 182 



Query: 185 DFRHTDPDMSLPLLDAMAESKLHFFQNQKVEKITVTREEKLCLRTLTGDTFTVDAVILAV 244 
++ D + L + + ++ + V++ + K+ F D VI+ V 
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Sbjct: 183 LNKYLDKPFTDVLEKELVnRGVNIALGFJW^ 242 

Query: 245 NFRPDSRLLTGLVDLSVDNSWVNDYFQTSDPNIYAIGDLIWSYFKGLNSAYYMPLINQA 304 

FRP++ LL VD+ + ++ VN+Y QTS+P+I+A GD ++ + Y+PL A 
Sbjct: 243 GFRPNTELLKDKVDMLPNGAIEVNEYMOTSNPDIFAAGDSAVTOYNPSQTKNYIPLATNA 302 

Query: 305 IRSAQMLAYHLSGHAVPKLKITRATGSKHFGYYRANIGLT ELEAGFYEDTV 355 

+R ++ +L+ + +G FG+ + G+T ++EA +ED 

Sbjct: 303 TOQGMLVGRl^TEQKLAYRGTQGTSGLYLFGWKIGSTGVTKESAKLNGLDVEATVFEDNY 362 

Query: 356 SVTYFPKEQYDLRIKLIANQKTGHLLGAQLISKENCLATANQLVQAISCDMTDFDIAFQD 415 

+ P + L ++L+ + T ++G QL+SK + +AN L A+ MT DLA D 
Sbjct: 363 RPEFMPTTEKVL-MELVYEKGTQRIVGGQLMSKYDITQSANTLSLAVQNKMTVEDLAISD 421 

15 Query: 416 FIY 418 

F + 

Sbjct: 422 FFF 424 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2857> which encodes the amino acid 
20 sequence <SEQ ID 2858>. Analysis of this protein sequence reveals the following: 

Possible site: 16 



25 



30 



»> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood = -3.35 Transmembrane 155 - 171 ( 155 - 173) 



Final Results 

bacterial membrane Certainty=0 . 2338 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



RGD motif: 54-56 



The protein has homology with the following sequences in the databases: 

>GP:CAA44611 GB:X62755 NADH peroxidase [Enterococcus faecalis] 
35 Identities = 111/428 (25%) , Positives = 202/428 (46%) , Gaps = 24/428 (5%) 

Query: 10 VIGASFAGLAFVDKYKDLNPDSQIILIDKESCPNYIPNGINQLFRGDIQDLSDAMWGRAC 69 

V+G+S G V++ +L+PD++I +K +++ G+ G ++D++ R 
Sbjct: 5 VLGSSHGGYEAWELLNLHPDAEIQWYEKGDFISFLSCGMQLYLEGKVKDVNSV RYM 61 

40 

Query: 70 LAAQIESN- -HRFIQAEVI^IFJ^PSNTLLLKDS-QGRVFEEGYETLVCAMGASPQSHYIE 126 

++ES + F E+ AI+ + + +KD G E Y+ L+ + GA P I 
Sbjct: 62 TGEKMESRGVNVFSNTEITAIQPKEHQVTVKDLVSGEERVENYDKLIISPGAVPFELDIP 121 

45 Query: 127 TSQTNKVLVTKYYEESQASLKLIEASQE VLVIGAGLIGLDLAYSLSLQGKRVKL1 181 

+ + + + Q ++KL + + + V+VIG+G IG++ A + + GK+V +1 

Sbjct: 122 GKDLDNIYLMR---GRQWAIKLKQKTVDPEVNNVWIGSGYIGIEAAEAFAKAGKKVTVI 178 

Query: 182 FJUVERPDFYQTDAELIAPVMAEMSTHHVTFINNKRVTAIHEIEGKVVAHTEQGDTFQGDL 241 
50 + +RP D E + EM +++T + V +E +G+V + + DL 

Sbjct: 179 DILDRPLGVYLDKEFTDVLTEEMEANNITIATGETVER-YEGDGRVQKVVTDKNAYDADL 237 

Query: 242 AILAINFRPNTHLLQGQVA<mLDKTILVMENLQTSQANIYAIGDMVSLHFGILGMDYYTP 301 
++A+ RPNT L+G + + I +E ++TS+ +++A+GD + + + 
55 Sbjct: 238 VWAVGvRPNTAWLKGTLELHPNGLIiCrDEYMRTSEPDVFAVGDATLIKYNPADTEVNIA 297 

Query: 302 LINQAMKTGQAIALHLAGYPIPPLQTVK-VLGSSHFDYYRASVGVTE EEAELY 353 

L A K G+ +L P+ P V+ G + FDY AS G+ E +E + 

Sbjct: 298 LATNARKQGRFAVKNLE-EPVTCPFPGVQGSSGIAVFDYKFASTGINEVMAQKLGKETKAV 356 

60 

Query: 354 I©TCSYLYQNGDSKNLFWLKLIARKTK3ILIGAQLLSKTNALVIANQLGQALALKVTDAD 413 

YL K W KL+ ++GAQL+SK + N + A+ K+T D 

Sbjct: 357 TVVEDYLMDFNPDKQKAWFKLVYDPETTQILGAQLMSKADLTANINAISLAIQAKMTIED 416 

65 Query: 414 LAFQDFLF 421 
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LA+ DF F 
Sbjct: 417 LAYADFFF 424 

An aligmnent of the GAS and GBS proteins is shown below. 

Identities = 192/440 (43%) , Positives = 276/440 (62%) , Gaps = 7/440 (1%) 

Query: 8 KVIVILGASFAGMTCAQKLRQLNPNWDIVLIDKEIHPD^ 67 

K I ++GASFAG+ K + LNP+ I+L1DKE P+Y+PNG+N +R +1 L+ AMW 
Sbjct: 6 KTIHVIGASFAGLAFVDKYKDLNPDSQIILIDKESCPNYIPNGINQLFRGDIQDLSDAMW 65 

Query: 68 -QTEEEQRLQNIRCLFGLKVEKINKEDRELMLSDGSSVY YDQLICAMGSQAESTYI 122 

+ ++++ +V I L+L D Y+ L+CAMG+ +S YI 

Sbjct: 66 GRACIAAQIESlfflRFIQAEVIAIEAPSISrrLLLKDSQGRVFEEGYETLVCAMGASPQSHYI 125 

15 Query: 123 DGADAQGVLTTKTYATSQNAKQVLDKSHKVAWGAGIIGLDIAYSLHESGKAVTLLEAQE 182 

+ + VL TK Y SQ + ++++ S +V V+GAG+IGLD+AYSL GK V L+EA E 
Sbjct: 126 ETSQTNKVLVTKYYEESQASLKLIEASQEVLVIGAGLIGLDLAYSLSLQGKRVKLIEAAE 185 

Query: 183 RPDFRHTDPDMSLPLLDAMAESKLHFFQNQKVEKITVTREEKLCLRTLTGDTFTVDAVIL 242 
20 RPDF TD ++ P++ M+ + F N++V I E K+ T GDTF D IL 

Sbjct: 186 RPDFYQTDAELIAP VMAEMSTHHVTFINNKRVTAIHEI - EGKWAHTEQGDTFQGDLAIL 244 

Query: 243 AVNFRPDSRLLTGLVDLSVDNSVWNDYFQTSDPNIYAIGDLIWSYFKGLNSAYYMPLIN 302 
A+NFRP++ LL G V ++D +++VN+ QTS NIYAIGD++ +F L YY PUN 
25 Sbjct: 245 AINFRPNTHLLQGQVACALDKTILVNENLQTSQANIYAIGDMVSLHFGILGMDYYTPLIN 304 

Query: 303 QAIRSAQMLAYHLSGHAVPKLKITRATGSKHFGYYRANIGLTELEAGFYEDTVSVTYFPK 362 

QA+++ Q LA HL+G+ +P L+ + GS HF YYRA++G+TE EA Y DT S Y 
Sbjct: 305 QAMKTGQALALHIAGYPIPPLQTVKVLGSSHFDYYRASVGVTEEEAELYMDTCSYLYQNG 364 

30 

Query: 363 EQYDL-RIKLIANQKTGHIiLGAQLISKENCLATANQLVQAISCDMTDFDLAFQDFIYTAR 421 

+ +L +KLIA + G L+GAQL+SK N L ANQL QA++ +TD DLAFQDF++ 
Sbjct: 365 DSKNLFWLKLIARKTDGILIGAQLLSKraALVIANQLGOA 424 

35 Query: 422 ESEMAYMLHQAAINLYEKRI 441 

S++AY LH+A + L+EKR+ 
Sbjct: 425 HSDIAYHLHEACLKLFEKRL 444 

There is also homology to SEQ IDs 1820, 1876, 4666. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 940 

A DNA sequence (GBSx0998) was identified in S.agalactiae <SEQ ID 2859> which encodes the amino 
acid sequence <SEQ ID 2860>. Analysis of this protein sequence reveals the following: 

45 Possible site: 31 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2980 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

55 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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A DNA sequence (GBSx0999) was identified in S.agalactiae <SEQ ID 2861> which encodes the amino 
acid sequence <SEQ ID 2862>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3548 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 942 

A DNA sequence (GBSxlOOO) was identified in S.agalactiae <SEQ ID 2863> which encodes the amino 
acid sequence <SEQ ID 2864>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1685 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 9525> which encodes amino acid sequence <SEQ ID 9526> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 2865> which encodes the amino acid 
sequence <SEQ ID 2866>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3125 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 179/476 (37%) , Positives = 279/476 (58%) , Gaps = 5/476 (1%) 

Query: 1 ^IEALMEKERRVQYRLLSFLRGSPQAIALIOjaLETCLSRATFLKYINNLNSYFEQEKV 60 
M+1E LM+KERR QYRLL L + + + LK + + LS+ T LKYI+NIiN ++ + 
45 Sbjct: 21 MKIEDLMDKERRAQYRLLVTLYHAKETLRLKDLMRLSNLSKVTLLKYIDNIJSffiLCREQGL 80 



Query: 61 NCRIvYYKDKLFLEEDYNLSNQEVLKALMKDSIKYTILISLFNQRQFTIVGLSQEIjMVSE 120 

C+++ KD L L+E+ ++++ L+K+S+ Y IL ++ F I LS ELMVSE 

Sbjct: 81 ACQLLLEKDSLSLKENGQFHWEDLVALLLKESVAYQILTYMYCHEHFNITNLSVELMVSE 140 

50 

Query: 121 ATIiNRHIAHIJffiLIAEFDIAISCjGKQIGDELQVffiYFYYELFKQLWSYDKCQNMIKKLDLD 180 
ATLNR IiAHLN+LL+EFD+A+SQG+Q+G ELQWRYFY+ELF+ + ++ +LD 
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Sbjct: 


141 


ATI^IRQLflHLNQLLSEFDLALSQGRQLGSELQWRYFYFELFRHTLTRQGID7iLVNQLDAS 


200 


Query: 


181 


SLILLIERJ^QHTLTREAHQNLGLWFSICTHRLLAMEKISDNLKPIVKHYQCNAFYKRLD 


240 






T T TTPTDT i T i P7\ i T. . i T<7 < T T> t , t T"\ i M U i VDT j_ 

Jj jj 1 JiKlj + h + clA + Jj + W + x K+ + +IJ+ W r + ivKij+ 




Sb j ct : 


201 


HIATLIERLIGQSLSAEALEQLLIWIjAISQARMSFQKSYNDHFLRDSDFMTSNIFFKRLE 


260 


Query: 


241 


AMiVLYMSRFALEYREGEVLATFAF 


300 










Sb j ct : 


261 


SMLLHYLRRYALEFDAFFAKSLFVFLHAYPLLPIASMKYSLGFGGPIADHISEALWLLKK 


320 


Query: 


301 


ESILADETSDQVIYQLGQLYSHYYFFKGHILVEQPDLEQTYRLIDHNMRDKLHHISKKII 


360 






++ + 1 +++ a. ± iao -ro irt Ivj xii + + t Xtj_H- + K Jj X t+ 




Sb j ct : 


321 


AHVI IHQTKEEI IYGLGI FFSKAYFFKGAI LSQPTNSQYLYQLVGEDKRALLRVI INHLV 


380 


Query: 


361 


ANVNRIRPLTEDGCSLLTLHLLELLIFSKNSQKMPFRIGLDMTGNAVEQSLLEYRIRQHF 


420 






+++ D L+ +L LLIFS P +GL + N VE ++ E IR+H 




Sbjct: 


381 


LQMDQ ETDFSQQLSDDILALLIFSIERHBEPLLVGIALGQNKVEAAIAELAIRRHL 


436 


Query: 


421 


SGNNS I QVEPYDEGKGFD - MVI YQSHSRPYKAKLTYCIiNKGASERELQE IDSLI YD 475 








Q+ PYD K +D ++ YQ+ P + Y L + +S EL +++ + D 




Sbj ct : 


437 


GHRRDFQLMPYDHQKVYDCLITYQTVCLPRQDLPYYRLKQYSSPYELTALEAFLKD 492 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 943 

A DNA sequence (GBSxlOOl) was identified in S.agalactiae <SEQ ID 2867> which encodes the amino 
acid sequence <SEQ ID 2868>. This protein is predicted to be transketolase (tktA-1). Analysis of this 
protein sequence reveals the following: 

Possible site: 27 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2084 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9527> which encodes amino acid sequence <SEQ ID 9528> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:BAB06071 GB:AP001515 transketolase [Bacillus halodurans] 

Identities = 403/661 (60%) , Positives = 520/661 (77%) , Gaps = 8/661 (1%) 



Query: 


6 


IDQLAVNTTOTLSIDAIQARNSGHPGLPMGARPMAYVLWNKFLNWPKTSRNVm^RFV 


65 






++QLAVNT+RTLSID+++ ANSGHPG+PMGAAPMA+ LW KF+N NP + +W NRDRFV 




Sbjct: 


5 


VEQIAVOTIRTLSIDSVEKANSGHPGMPMGAAPMAFCLOT 


63 


Query: 


66 


LSAGHGSALLYSLLHLAGYDLSIDDLKQFRQWGSKTPGHPEVNHTDGVEATTGPLGQGIA 


125 






LSAGHGS LLYSLLHL GYDLS+++L+ FRQWGSKTPGHPE HT GVEATTGPLGQG+A 




Sbjct: 


64 


LSAGHGSMLLYSLLHLTGYDLSLEELQNFRQWGSKTPGHPEYGHTPGVEATTGPLGQGVA 


123 


Query: 


126 


NAVGMAMAEAHLAAKFNKPGFDLVDHYTYTLHGDGCLMEGVSQEAASIAGHLKLGKLVLL 


185 






AVGMAMAE HLAA +N+ G+++VDHYTYT+ GDG LMEGVS EAASLAGHLKLG+++LL 




Sbj ct : 


124 


^VGMAMAERHIAATYNRDGYNI TOHYTYTICGDGDL^ 


183 


Query: 


186 


YDSNDISLDGPTSQSFTEDVKGRFESYGWQHILVKDGNDLEAIAAAIEAAKAETDKPTII 


245 






YDSNDISLDG SF+E V+ RF++YGW + V+DGN+L+ IA AIE AKA+ ++P++I 




Sbj ct : 


184 


YDSNDISLDGDLHHSFSESVEDRFKAYGWHVTOVEIX3NNLDEIAKAIEEAKAD-ERPSLI 


242 


Query: 


246 


EVXTIIGFGAEKOGTSSV-HGAPLGAEGITFAKKAYVWEYP-DFTVPAEVADRFASDLQA 


303 
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Sbjct: 



243 



EVKT IGFG+ +G SV HGAPLGA+ + K+AY W Y +F +P EVA + ++ 
EVKTTIGFGSPNKGGKSVSHGAPLGADEVKLTKEAYEWTYENEFHIPEEVA-AYYEQVKQ 



301 



Query: 



304 



RGAKAEEAWNDLFAKYEVEYPELATEYKEAFAG QAETVELKAHDLGSSVASRVSSQQ 

+GA+ EE+WN+LFA+Y+ YPEIA++++ AG + ++++G SVA+R SS + 



360 



Sbjct: 


302 


QGAEKEESWNELFAQYKKAYPELASQFELAVHGDLPEGWDAVAPSYEVGKSVATRSSSGE 


361 




361 


AIOOI.^TOLPNLWRGSADLSASNNTMVAAETDFOASNYAGI^IWFGWEFAMAAaMNGIA 


420 






A+ + +P L+GGSADL++SN T++ E +F +Y+GRN+ WFGVRE FAM AAMNG+A 




Sb j ct : 


362 


AIiNAFAKTVPQLFGGSADIASSNKTLT^^ 


421 




421 


JjnUUJ. X\. v ±\3\n X L7 C V C On J, < ' i ' y j\.L'l/-\MI iWI'J l.ir 1 V X VrlJ.nL/Q XiT. V uDUur J. IXLUXT A. XXi\^X-Xrixj V 


480 






LHGG +V+G TFFVFS+YL PA+R+AAL LP +YV THDS IAVGEDGPTHEP+EQLAS+ 




Sb j ct : 


422 


LHGGLKVFGATFFVFSDYLRPAIRLAALMQLPVIYVFTHDSIAVGEDGPTHEPVEQIASL 


481 


Query: 


481 


RSMPNIjNVIRPADGNETNAAWQRAVSETDRPTMLVLTRQNLPVLEGTSEIAQEGVNKGAY 


540 






R+MP L+VIRPADGNE+ AAW+ A+ D+PT LVL+RQNLP LEG + A +GV+KGAY 




Sbjct: 


482 


rampglsvirpadgnesvaawklaleskdqptalvlsrqmlptlegavdraydgvskgay 


541 


Query: 


541 


ILSEAKGELDGIIIATGSEVKIALDTQDKLESEGIHVRWSMPAQNIFDEQEASYQEQVL 


600 






+L+ AG D +++A+GSEV LA++ ++ LE EGIH WSMP+ + F+ Q A Y+E+VL 




Sb j ct : 


542 


VLAPANGSADT,T,TJ,ASGSEVSIAWAKEALEKEGIHAAWSMPSWDRFEAQSAEYKEEVL 


601 


Query: 


601 


PSAVTKRIAIEAGSSFGWGKYVGLNGLTLTIDTWGASAPGNRIFEEYGFTVENAVSLYKEL 661 






PS VT RLAIE GSS GW KYVG G + ID +GASAPG RI EE+GFTV++ V+ K L 


Sbj ct : 


602 


PSDVTARLAIEMGSSLGWAKYVGNQGDWAIDRFGASAPGERIMEEFGFTVQHWARAKAL 662 



There is also homology to SEQ ID 520. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 944 

A DNA sequence (GBSxl002) was identified in S.agalactiae <SEQ ID 2869> which encodes the amino 
acid sequence <SEQ ID 2870>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>» Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9529> which encodes amino acid sequence <SEQ ID 9530> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2871> which encodes the amino acid 
sequence <SEQ ID 2872>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 .4477 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0 .4581 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



Identities = 27/79 (34%) , Positives = 45/79 (56%) 



WO 02/34771 



-1035- 



PCT/GB01/04789 



Query: 3 MKKECRDFYRQIQHTYNDISVREDAVLSSILLSASNGLIKTSDVPRVAYELTQQLENNEI 62 

M+K+ + Y 1+ Y+ RE+ LS +LL+ASN LIK S+ VAY+L Q ++N + 
Sbjct: 1 MEKKRQRLYDVIRQAYDYPENREOTALSQIiLLftASNRLIKHSNPLLVAYQLNQDVDNYLIj 60 

5 

Query: 63 EKSFESIATVKELKKSAKK 81 

+ ++ K+S +K 

Sbjct: 61 BNDILLPKSLCRFKQSLEK 79 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 945 

A DNA sequence (GBSxl003) was identified in S.agalactiae <SEQ ID 2873> which encodes the amino 
acid sequence <SEQ ID 2874>. This protein is predicted to be ABC transporter, ATP-binding protein. 
1 5 Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 2610 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:CAB49925 GB:AJ248286 ABC transporter, ATP-binding protein 

[Pyrococcus abyssi] 
Identities = 96/243 (39%), Positives = 164/243 (66%), Gaps = 2/243 (0%) 

Query: 1 MIKFEHVSKVYGEKEALSDLTLSVKDGEIFGLIGHNGAGKTTTISILTSIIDATYGQVYI 60 
30 MI E++ K +G KE L ++ +VKDGEI+GL+G NG+GK+TT+ IL+ II G+V + 

Sbjct: 1 MIIVENLRKRFGGKEVLKGISFTVKDGEIYGLLGPNGSGKSTTMRILSGIITDFEGKVIV 60 

Query: 61 DDLLLTEHRDQIKKKIGYVPDSPDIFLNLTAEEYWYFLAKIYDVAPEDIEARITKLVDIF 120 
+ + + Q+K+ +GYVP++P ++ +LT E++ F+ + + + +E R+ KLV+ F 
35 Sbjct: 61 GGVEVAKDPLQVKRIVGYVPETPALYESLTPAEFFSFVGGVRGIPKDILEERVRKLVEAF 120 

Query: 121 ELEEQRYNPIESFSHGMRQKVIVIGALLPNPDIWILDEPLTGLDPQASFDLKEMMKEHAK 180 

E+++ I + S G +QK+ +1 +LL +P + ILDE + GLDP+++ +E++ E + 

Sbjct: 121 EIKKYMNQLIGTLSFGTKQKISLISSDLHDPKVLILDEAMNGLDPKSARIFRELLYEFKE 180 

40 

Query: 181 NGKTVIFSTHVLAVAEQLCDRIGILKQGKLIFVGSLGELKMKYPDKDLETIYLELAGRQA 240 

GK+++FSTHVLA+AE +CDR+GI+ QG++I G++ ELK ++ LE ++L+L QA 
Sbjct: 181 EGKSIVFSTHVLALAELICDRVGIIYQGRIIAEGTVEELKEISKEERLEDVFLKLT--QA 238 

45 Query: 241 SRE 243 

E 

Sbjct: 239 KEE 241 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2875> which encodes the amino acid 
50 sequence <SEQ ID 2876>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm --- Certainty=0. 2723 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



WO 02/34771 



PCT/GB01/04789 



-1036- 



Identities = 182/244 (74%) , Positives = 215/244 (87%) 

Query: 1 MIKFEHVSKVYGEKEALSDLTLSVKDGEIFGI1IGHNGR.GKTTTISILTSIIDATYGQVYI 60 

MI+F+HVSK+YG+KEALSDL +++ DGE I FGLIGHNGAGKTTTI S I LTS 1 1 +A+YG+V++ 
Sbjct: 1 MIEFKHVSKLYGDI<EALSDLNVTII03EIFGLIGHWGAGKTTTISILTSIIEASYGEVFV 60 

Query: 61 DDLLLTEHRDQIKKKIGYVPDSPDIFLNLTAEEYWYFLAKIYDVAPEDIE7ARITKLVIIIF 120 

D LLTE+R+ IKK+I YVPDSPDIFLNLT EYW FLAKIY V+ ED E R+ +L +F 
Sbjct: 61 DGQLLTENREAIKKQIAYVPDSPDIFimTPNEYWQFLAKIYGVSDEDREERIiAQLTTLF 120 

Query: 121 ELEEQRYNPIESFSHGMRQKVIVIGALLPNPDIWILDEPLTGLDPQASFDLKEMMKEHAK 180 

EL+E+ I +SFSHGMRQKVI VIGAL+ NP+IWILDEPLTGLDPQASFDLKEMMK HA 
Sbjct: 121 ELKEEVNQTIDSFSHGMRQKVIVIGALVSNPNIWILDEPLTGLDPQASFDLKEMMKAHAA 180 

Query: 181 NGKWIFSTHVLAVAEQLCDRIGILKQGKLIFVGSLGELKMKYPDKDLETIYLELAGRQA 240 

+G TV+FSTHVL+VAEQLCDRIGILK+GKLIFVG++ ELK +PDKDLE+IYLELAGR+A 
Sbjct: 181 SGHTVLFSTHVLSVAEQLCDRIGILKKGKLIFVGTIDELKEHHPDKDLESIYLEIaAGRKA 240 

Query: 241 SREG 244 
EG 

Sbjct: 241 QEEG 244 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 946 

A DNA sequence (GBSxl004) was identified in S.agalactiae <SEQ ID 2877> which encodes the amino 
acid sequence <SEQ ID 2878>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 
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-4 


78 


Transmembrane 


125 




141 


( 121 


- 148) 


INTEGRAL 


Likelihood 




-4 


51 


Transmembrane 


76 




92 


( 71 


- 98) 


INTEGRAL 


Likelihood 




-3 


56 


Transmembrane 


406 




422 


( 400 


- 426) 



The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2879> which encodes the amino acid 
sequence <SEQ ID 2880>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 
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Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 6371 (Affirmative) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



WO 02/34771 



PCT/GB01/04789 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



•4.99 
-4.25 
-0.80 
-0.11 



Transmembrane 446 

Transmembrane 369 

Transmembrane 87 

Transmembrane 334 



462 ( 444 - 464) 

385 ( 367 - 387) 

103 ( 87 - 104) 

350 ( 334 - 350) 



Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0. 6731 (Affirmative) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9173> which encodes the amino acid sequence 
<SEQ ID 9174>. Analysis of this protein sequence reveals the following: 



Possible site: 51 

to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty= 0.673 (Affirmative) < suco 
Certainty=0.0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 255/542 (47%) , Positives = 378/542 (69%) , Gaps = 12/542 (2%) 



Query: 1 MfcWSRIWELVKINILYSNPQTLSALRKKQEKHPKKEFSAYKSMFRNQLFQILLFSIIYVF 60 

MNWS IWEL+KINILYSNPQ+L+ L+K+QEKHPK+ F AYKSM R Q I +F +IY+F 
Sbjct: 15 MNWSTIWELIKINILYSNPQSIANLKKRQEKHPKENFKAYKS^MRQQALMIAMFLVIYLF 74 

Query: 61 LFVSLDFKEYPGYFTFYIGIFTLVSIIYSFIAMYSVFYESDDVKQYAYLPIKSEELYVAK 120 

+F+ +DF YPG F+F + +F ++S + +F ++Y++FYES+D+K Y +LP+ SEELY+AK 
Sbjct: 75 MFIGVDFSHYPGLFSFDVAMFFIMSTLTAFSSLYTIFYESNDLKLYIHLPVTSEELYIAK 134 

Query: 121 IFATFGMSVTFLMPILTLMIVAYWRIIGGPLAVLLAIINFAILFLSVTVISLYINSLIGR 180 

I ++ GM FLMP+++L+++AYW+++G PL++L+AI+ F +L +S V+++YIN+ +G+ 
Sbjct: 135 IVSSLGMGAVFLMPLISLLLIAYWQLLGNPLSILVAIVLFLVLLVSSMVLAIYINAWVGK 194 

Query: 181 AIIRSANRKLISTILISLATFGAIVPLLFVNMTSQK--MVQGKLQDIAPIPYVRGYYDIV 238 

I+RS RKLISTI++ ++TFGA V + +N+++ K M G D IPY +G+YD+V 
Sbjct: 195 IIVRSRKRKLISTIMMFVSTFGAFVLIFAINISNNKRTMTDGVFTDYPTIPYFKGFYDW 254 

Query: 239 TAPFSMESLLNYYLPLLIILFLIGAIYKWVMPRYYQELLY GQVKQRK- -VHRQIDF 292 

APFS +LLN++LPLL+IL ++ I VMP YY+E Y +VKQ K V+R 

Sbjct: 255 QAPFSTAALLNFWLPLLLILA^GIOTKVMPTYYREAFYISMNKVKQTKKPVITOP--- 311 

Query: 293 SKRES INKTLVKHHLS SLQNATLLTNTFLMPLLYLAMFIVP I LNNGKE IGRFFNENYFGI 352 

+ +S+ + L KHHL +LQNATILT T+LMPL+Y+ +FI P L+ G + + +YFG+ 
Sbjct: 312 HQNQSLAQLLRKHHLLTLQNATLLTQTYLMPIjMYVMLFIGPSLSRGTGFFKHISPDYFGV 371 

Query: 353 AFLAGILIGSLCVMPASIVGVGISLEKSNFYFIKSLPISFSYFLKHKFVTLITLQLAVPT 412 

A L G+ +G +C PS +GVGISLEK NF FIKSLPI+ ' FL KF L+ LQL VP 
Sbjct: 372 ALLFGVSLGVMCATPTSFIGVGISLEKDNFTFIKSLPITLKKFLMDKFCLLVGLQLIVPM 431 



Query: 413 FIYFLVGFFLLKLSILVLLSFILGLVPMGLIEGQFIYRRDYKHLFLNWQEVTQLFNRGLG 472 
IY + G F+L L L+ ++F LS +++G+ +YRRDY+ L L WQ++TQLF RG G 



WO 02/34771 



PCT/GB01/04789 



-1038- 

Sbjct: 432 VIYLVFGLFVLHLHPLLTIAFCLGYALSLIVQGELMYRRDYRLLDLKWQDMTQLFTRGDG 491 

Query: 473 QWLLVGSLFGMMI IGSFL - IGI S I FWSMVWNTVAVNI I ILI IGLLILS I CQYLLLKNFWK 531 

QWL +G +FG +1+ L G I +++ + + L++L + Q + K FWK 

Sbjct: 492 QWLTMGLIFGNLIVAGVLGFGAVIIANIIQQPLLISILLSCLILMVLGIAQLWIQKTFWK 551 

Query: 532 KL 533 
L 

Sbjct: 552 SL 553 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 947 

A DNA sequence (GBSxl005) was identified in S.agalactiae <SEQ ID 2881> which encodes the amino 
acid sequence <SEQ ID 2882>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 



-- Certainty=0 .4248 (Affirmative) < suco, 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15963 GB:Z99124 phosphotransferase system (PTS) 

beta-glucoside-specif ic enzyme IIABC component [Bacillus subtilis] 
Identities = 175/447 (39%) , Positives = 266/447 (59%) , Gaps = 10/447 (2%) 

Query: 4 EYITLSKNIIKHLGGQNNINNVYHCQTRLRFSIjNDPTKVNLEQLKTLKEVKTVVISGGQH 63 

+Y LSK+I++ +GG+ N+ V HC TRLRF+L+D K + QL+ L V ISG Q 
Sbjct: 2 DYDKLSKDILQLVGGEENVQRVIHCMTRLRFNLHDNAKADRSQLEQLPGVMGTNISGEQF 61 

Query: 64 QIVIGTHVAKVFEEI NSLIETNSTTKIEQTKKAKAVSRI IDFVSGTFQPILPALSGA 120 

QI+IG V KV++ I ++L + S Q K +S + D +SG F PILPA++GA 

Sbjct: 62 QIIIGNDVPKVYQAIVRHSNLSDEKSAGSSSQKKNV--LSAVFDVISGVFTPILPAIAGA 119 

Query: 121 GMIKALLALLLVFKILTPSSQTYILLNLFADGVFYFLPILIAITAAQKLKANPILALGTV 180 

GMIK L+AL + F + SQ +++L DG FYFLP+L+A++AA+K +NP +A 
Sbjct: 120 GMIKGLVAljAVTFGWMAEKSQVHVILTAVGDGAFYFLPLLLAMSAARKFGSNPYVAAAIA 179 

Query: 181 VMLLHPNWANLVASGKPVSLFHTIPFTLTNYASSVIPIILIICVQAYIEKYLKQIIPKSL 240 

+LHP+ L+ +GKP+S F +P T Y+S+VIPI+L I + +Y+EK++ + SL 
Sbjct: 180 AAILHPDLTALLGAGKPIS-FIGLPVTAATYSSTVIPILLSIWIASYVEKWIDRFTHASL 238 

Query: 241 RLVLVPMLIFLSMGILSFSILGPMGTIAGQYLAVIFTFLSKYASW-APAFLVGAFAPILI 299 

+L++VP L + L+ +GP+G I G+YL+ +L +A A FL G F+ ++I 
Sbjct: 239 KLIWPTFTLLIVVPLTLITVGPLGAILGEYLSSGVNYLFDHAGLVAMIFLAGTFS-LII 297 

Query: 300 MFGVHSGIAALGITQLAKLGVDSIFGPGMLCSNIAQATAGTVVTLITKEKKLKEIAGPAA 359 

M G+H + I +A+ G D + P M +N+ QA A VL++KKK+A + 

Sbjct: 298 MTGMHYAWPIMINNIAQNGHDYLL-PAMFLAWGQAGASFAVFLRSRNKKFKSLALTTS 356 

Query: 360 ITAYMGITEPILYGvNLPKRYPLIASLIGGGLGGLYAGIMNAHRFAV-GSSGLPGLFLYI 418 

ITA MGITEP +YGVN+ + P A+LIGG GG + G+ + V G++GLP + ++I 

Sbjct: 357 ITALMGITEPAMYGVNMRLKKPFAAALIGGAAGGAFYGMTGVASYIVGGNAGLPSIPVFI 416 



WO 02/34771 
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Query: 419 SHTSTHLFITMLIAVI ITVSTTAILTF 445 

T + I ++IA S +L F 

Sbjct: 417 GPTFIYAMIGLVIAFAAGTSAAYLLGF 443 

5 

There is also homology to SEQ ID 2884. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 948 

10 A DNA sequence (GBSxl006) was identified in S.agalactiae <SEQ ID 2885> which encodes the amino 
acid sequence <SEQ ID 2886>. This protein is predicted to be gamma-glutamyl kinase (proB). Analysis of 
this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 
15 INTEGRAL Likelihood = -0.11 Transmembrane 160 - 176 ( 160 - 176) 

Final Results 

bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



25 



>GP:CAA63147 GB:X92418 gamma-glutamyl kinase [Streptococcus thermophilus] 
Identities = 200/265 (75%) , Positives = 235/265 (88%) 

Query: 1 MKRHFETTRRIVIKVGTSSLVQTSGKINLSKIDHIAFVISSLMNRGMEVILVSSGAMGFG 60 

MKR+F++ +R+VIK+GTSSLV SGKINL KID LAFVISSL N+G+EV+LVSSGAMGFG 
Sbjct: 1 MKRNFDSVKRLVIKIGTSSLVLPSGKINLEKIDQLAFVISSLHNKGIEVVLVSSGAMGFG 60 

30 Query: 61 LDILKMDKRPQEISQQQAVSSVGQVAMMSLYSQIFSHYQTHVSQILLTRDVWFPESLQN 120 

L++L ++KRP E+ +QQAVSSVGQVAMMSLYSQ+FSHYQT VSQ+LLTRDW + ESL N 
Sbjct: 61 LNVLDLEKRPAEVGKQQAVSSVGQVAMMSLYSQVFSHYQTKVSQLLLTRDVVEYSESLAN 120 

Query: 121 VTNSFESLLSMGILPIVNENDAVSVDEMDHKTKFGDNDRLSAWAKITKADLLIMLSDID 180 
35 N+FESL +G++PIVNENDAVSVDEMDH TKFGDNDRLSA+VAK+ ADLLIMLSDID 

Sbjct: 121 AINAFESLFELGWPIVNENDAVSVDEMDHATKFGDNDRLSAIVAKWGADLLIMLSDID 180 

Query: 181 GLFDKNPNIYDDAVLRSHVSEITDDIIKSAGGAGSKFGTGGMLSKIKSAQMVFDNNGQMI 240 
GLFDKNPN+Y+DA LRS+V EIT++I+ SAGGAGSKFGTGGM+SKI KSAQMVF+N QM+ 
40 Sbjct: 181 GLFDKNPNVYEDATLRSYVPEITEEILASAGGAGSKFGTGGMMSKIKSAQMVFENQSQMV 240 

Query: 241 LMNGANPRDILKVLDGHNIGTYFAQ 265 

LMNG NPRDIL+VL+G IGT F Q 
Sbjct: 241 LMNGENPRDILRVLEGAKIGTLFKQ 265 

45 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2887> which encodes the amino acid 
sequence <SEQ ID 2888>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
>» Seems to have no N-terminal signal sequence 
50 INTEGRAL Likelihood = -1.97 Transmembrane 163 - 179 ( 163 - 179) 

INTEGRAL Likelihood = -0.06 Transmembrane 124 - 140 ( 124 - 140) 

Final Results 

bacterial membrane Certainty^O. 1786 (Affirmative) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 
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>GP:CA&63147 GB:X92418 gatnma-glutamyl kinase [Streptococcus thermophilus] 
Identities = 212/265 (80%) , Positives = 237/265 (89%) 



Query: 


4 


MKRQFEDVTRIVIKIGTSSLVLPTGKINLEKIDQLAFVISSLMNKGKEVILVSSGAMGFG 


63 






MKR F+ V R+VIKIGTSSLVLP+GKINLEKIDQLAFVISSL NKG EV+LVSSGAMGFG 




£>JJ J LL. 


J. 




60 


Query: 


64 


LDILKMEKRPTNI^^ 


123 






L++L +EKRP + KQQAVSSVGQVAMMSLYSQ+F++YQT VSQ+LLTRDW + ESLAN 




Sb j ct : 


61 




120 


Query: 


124 


VTNAFESLISLGIVPIVNENDAVSVDEMDHATKFGDNDRLSAWAGITKADLLIMLSDID 


183 






NAFESL LG+VPIVNENDAVSVDEMDHATKFGDNDRLSA+VA + ADLLIMLSDID 




Sb j ct : 


121 


AINAFESLFELGWPIVNENDAVSVDEMDHATKFGDNDRLSAIVAKWGADLLIMLSDID 


180 


Query: 


184 


GLFDKNPTIYEDAQLRSHVANITQEIIASAGGAGSKFGTGGMLSKVQSAQMVFENKGQMV 


243 






GLFDKNP +YEDA LRS+V IT+EI+ASAGGAGSKFGTGGM+SK++SAQMVFEN+ QMV 




Sbjct: 


181 


GLFDKNPNVYEDATLRSYVPEITEEILASAGGAGSKFGTGGMMSKIKSAQMVFENQSQMV 


240 


Query: 


244 


LMNGANPRDILRVLEGQPLGTWFKQ 268 








LMNG NPRD I LRVLEG +GT FKQ 




Sb j ct : 


241 


LMNGENPRDILRVLEGAKIGTLFKQ 265 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/265 (81%) , Positives = 242/265 (90%) 



Query: 


1 


MKRHFETTRRIVIKVGTSSLVQTSGKINLSKIDHIAFVISSLMNRGMEVILVSSGAMGFG 


60 






MKR FE RIVIK+GTSSLV +GKINL KID LAFVI SSLMN+G EVILVSSGAMGFG 




Sbjct: 


4 


MKRQFEDVTRIVIKIGTSSLVLPTGKINLEKIDQIAFVISSLMNKGKEVILVSSGAMGFG 


63 


Query: 


61 


LDILKMDKRPQEISQQQAVSSVGQVaMMSLYSQIFSHyQTHVSQILLTRDVWFPESLQN 


120 






LDILKM+KRP +++QQAVSSVGQVAMMSLYSQIF++YQT+VSQILLTRDVWFPESL N 




Sb j ct : 


64 


LDILKMEKRPTNIAKQQAVSSVGQVAmSLYSQIFAYYQTNVSQILLTRDVWFPESLAN 


123 


Query: 


121 


VTNSFESLLSMGILPIVNENDAVSVDEMDHKTKFGDNDRLSAWAKITKADLLIMLSDID 


180 






VTN+FESL+S+GI+PIVNENDAVSVDEMDH TKFGDNDRLSAWA ITKADLLIMLSDID 




Sb j ct : 


124 


vTNAFESLISLGIVPIVNEOTAVSVDFJylDHATKFGDNDRLSAWAGITKADLLIMLSDID 


183 


Query: 


181 


GLFDKNPNIYDDAVLRSHVSEITDDIIKSAGGAGSKFGTGGMLSKIKSAQMVFDNNGQMI 


240 






GLFDKNP IY+DA LRSHV+ IT +11 SAGGAGSKFGTGGMLSK++SAQMVF+N GQM+ 




Sb j ct : 


184 


GLFDKNPTIYEDAQLRSHVANITQEIIASAGGAGSKFGTGGMLSKVQSAQMVFENKGQMV 


243 


Query: 


241 


LMNGANPRD I LKVLDGHNIGTYFAQ 265 








LMNGANPRDIL+VL+G +GT+F Q 




Sb j ct : 


244 


LMNGANPRDILRVLEGQPLGTWFKQ 268 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 949 

A DNA sequence (GBSxl007) was identified in S.agalactiae <SEQ ID 2889> which encodes the amino 
acid sequence <SEQ ID 2890>. This protein is predicted to be unnamed protein product (proA). Analysis of 
this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3517 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 289 1> which encodes the amino acid 
sequence <SEQ ID 2892>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
>» Seems to have no N- terminal signal sequence 



Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA63148 GB:X92418 gamma-glutamyl phosphate reductase 
[Streptococcus thermophilus] 
Identities = 309/416 (74%) , Positives = 355/416 (85%) 

Query: 1 MTDMRRLGQRAKQASLLIAPLSTQIKNRFLSTLAKALVDDTQTLLAANQKDLANAKEHGI 60 

MT + LGQ+AK AS IA LST KN L+ +AKALV ++ + N KD+ANA E+GI 
Sbjct: 1 MTYVDTLGQQAKVASRQIAKLSTAAKNDLLNQVAKALVAESDYIFTENAKDMANASENGI 60 

20 Query: 61 SDIMMDRLRLTSERIKAIAQGVQQVADLADPIGQVIKGYTNLDGLKILQKRVPLGVIAMI 120 

S IM DRL LT +RI IA+GV+QVADL DPIGQV++GYTNLDGLKI+QKRVP+GVIAMI 
Sbjct: 61 SKIMQDRLLLTEDRIAGIAEGVRQVADLQDPIGQWRGYTNLDGLKIVQKRVPMGVIAMI 120 

Query: 121 FESRPNVSVDAFSLAFKTNNAIILRGGKDALHENKIVLVKLIRQSLEKSGITPDAVQLVED 180 
25 FESRPNVS+DAFSIAFKTNNAIILRGG+DA++SNKALV + R++L+ +GIT DAVQ VED 

Sbjct: 121 FESRPNVSIDAFSLAFKTNNAIILRGGRDAINSNKALVTVARKALKNAGITADAVQFVED 180 

Query: 181 PSHAVAEELMQATDYVDVTjIPRGGAKIjIQTVKEKAKVPVIETGVGNVHIYVDAQADLDIA 240 
SH VAEELM AT YVD+LIPRGGA+LIQTVKEKAKVPVIETGVGN HIYVD A+LD+A 
30 Sbjct: 181 TSHEVAEELMVATKYVTJLLIPRGGARLICjTVKEKAKVPV^ 240 

Query: 241 TKIVINAKTKRPSVCNAAEGLVIHEAVAARFIPMLEKAINQVQPVEWRADDKALPLFEQA 300 

T+ IVINAKT+RPS VCNAAE LV+H + F+P LEKAI+++Q VE+RAD++AL L E+A 
Sbjct: 241 TQIVINAKTQRPSVCNAAESLWHADIVEEFLPNLEKAISKIQSVEFRADERALKLMEKA 300 

35 

Query: 301 VPAKAEDFETEFLDYIMSVKWSSLEEAISWINQYTSHHSEAIITRDIKAAETFQDLVDA 360 

VPA EDF TEFLDYIMSVKW SL+EAI+WIN YT+ HSEAI+T+DI AE FQD VDA 
Sbjct: 301 VPASPEDFATEFLDYIMSVKVVDSLDEAINWINTYTTSHSEAIVTQDISRAEQFQDDVDA 360 

40 Query: 361 AAVYVNASTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYINGDGHIRE 416 

AAVYVNASTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYING G IRE 
Sbjct: 361 AAVYVNASTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYINGQGQIRE 416 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 307/417 (73%) , Positives = 353/417 (84%) , Gaps = 1/417 (0%) 

Query: 1 MTYIEILGQNAKKASQSVARLSTASKNEILRDLARNIVADTETILTENARDWKAKDNGI 60 

MT + LGQ AK+AS +A LST KN L LA+ +V DT+T+L N +D+ AK++GI 
Sbjct: 1 MTDMRRLGQRAKQASLLIAPLSTQIKNRFLSTLAKALVnDTQTLLAANQKDLANAKEHGI 60 

50 

, Query: 61 SEIMVDRLRIiNKDRIQAIANGIYQVADIADPIGQWSGYTNLDGLKILKKRVPLGVIAMI 120 
S+IM+DRLRL +RI+AIA G+ QVADLADPIGQV+ GYTNLDGLKIL+KRVPLGVIAMI 
Sbjct: 61 SDIMMDRLRLTSERIKAIAQGVQQVADLADPIGQVIKGYTNLDGLKILQKRVPLGVIAMI 120 

55 Query: 121 FESRPNVSVDAFSIAFKTGNAIILRGGKDAIFSNTALVNCMRQTLQDTGHNPDIVQLVED 180 

FESRPNVSVDAFSLAFKT NAI ILRGGKDA+ SN ALV +RQ+L+ +G PD VQLVED 
Sbjct: 121 FESRPNVSVDAFSLAFKTNNAIILRGGKDALHSNKALVKLIRQSLEKSGITPDAVQLVED 180 

Query: 181 TSHVVAEELMQATDYvnVLIPRGGAKLIQTVKEKSKIPVIETGVGNVHIYIDEFADLDMA 240 
60 SH VAEELMQATDYVBVIjIPRGGAKLIQTVKEK+K+PVIETGVGNVHIY+D ADLD+A 

Sbjct: 181 PSHAVAEELMQATDYTOVLIPRGGAKLIQTVKEKAKyPVIETGVGNVRIYvDAQADLD 240 



Query: 241 AKIVINAKTQRPSVCNAAEGLWHQAIAKGFLSQLEKMLKESNQSVEFRADEEALQLLEN 300 
KIVINAKT+RPSVCNAAEGLV+H+A+A F+ LEK + + Q VE+RAD++AL L E 
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Sbjct: 241 TKIVINAKTKRPSVOJAAEGLVIHEAVAARFIPMLEKAINQV-QPVEWRADDKALPLFEQ 299 

Query: 301 AVAASESDYATEFLDYIMSVroA/DSFEQAISWINKYSSHHSEAIITNNISRAEIFQDMVD 360 

AV A D+ TEFLDYIMSVKW S E+AISWIN+Y+SHHSEAIIT +1 AE FQD+VD 
Sbjct: 300 AVPAKAEDFETEFLDYIMS VTOTVSSLEEAISWINQYTSHHSEAI ITRDI KAAETFQDL VD 359 

Query: 361 AAAVYWASTRFTDGFVFGLGAEIGISTQKLHARGPMGLEALTSTKYYINGTGQWE 417 

AAAVYVNASTRFTDGFVFGLGAEIGISTQK+HARGPMGLEALTSTK+YING G +RE 
Sbjct: 360 AAAVYVNASTRFTDGFVFGLGAEIGISTQKMHARGPMGLEALTSTKFYINGDGHIRE 416 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 950 

A DNA sequence (GBSxl008) was identified in S.agalactiae <SEQ ID 2893> which encodes the amino 
acid sequence <SEQ ID 2894>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1859 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 953 1> which encodes amino acid sequence <SEQ ID 9532> 
was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2895> which encodes the amino acid 
sequence <SEQ ID 2896>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0853 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 259/315 (82%) , Positives = 287/315 (90%) 



Query: 


1 


MTNDFHHITVLLHETVDMLDIKPDGIYVDATLGGAGHSEYLLSQLGPDGHLYAFDQDQKA 


60 






MT +FHH+TVLLHETVDMLDIKPDGIYVDATLGG+GHS YLLS+LG +GHLY FDQDQKA 




Sb j ct : 


22 


MTKEFHHVTVLLHETVDMLDIKPDGIYVDATLGGSGHSAYLLSKLGEEGHLYCFDQDQKA 


81 


Query: 


61 


IDNAHIRLKKYVDTGQVTFIKDNFRNLSSNLKALGVSEINGICYDLGVSSPQLDERERGF 


120 






IDNA + LK Y+D GQVTFIKDNFR+L + L ALGV EI+GI YDLGVSSPQLDERERGF 




Sb j ct : 


82 


IDNAQVTLKSYIDKGQVTFIKDNFRHLKARLTALGVDEIDGILYDLGVSSPQLDERERGF 


141 


Query: 


121 


SYKQDAPLDMRMNREQSLTAYDVVNTYSYHDLvRIFFKYGEDKFSKQIARKIEQvRAEKT 


180 






SYKQDAPLDMRM+R+ LTAY+WNTY ++DLV+IFFKYGEDKFSKQIARKIEQ RA K 




Sbjct: 


142 


SYKQDAPLDMRMDRQSLLTAYEVVNTYPFNDLVKIFFKYGEDKFSKQIARKIEQARAIKP 


201 


Query: 


181 


ISTTTELAEIIKSSKSAKELKKKGHPAKQIFQAIRIEVNDELGAADESIQQAMDLIiAVDG 


240 






I TTTELAE+IK++K AKELKKKGHPAKQIFQAIRIEVNDELGAADESIQ AM+LLA+DG 




Sb j ct : 


202 


IETTTELAELIKAAKPAKELKKKGHPAKQIFQAIRIEVNDELQAADESIQDAMELLALDG 


261 


Query: 


241 


RISVITFHSLEDRLTKQLFKEASTVEVPKGLPFIPDDLQPKMELVNRKP1LPSQEELEAN 


300 






RISVITFHSLEDRLTKQLFKEASTV+VPKGLP IP+D++PK ELV+RKPILPS EL AN 




Sb j ct : 


262 


RISVITFHSLEDRLTKQLFKEASTVDVPKGLPLIPEDMKPKFELVSRKPILPSHSELTAN 


321 
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Query: 301 NRAHSAKLRVARRIR 315 

RAHSAKLRVA++IR 
Sbjct: 322 KRAHSAKLRVAKKIR 336 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 951 

A DNA sequence (GBSxl009) was identified in S.agalactiae <SEQ ID 2897> which encodes the amino 
acid sequence <SEQ ID 2898>. This protein is predicted to be FtsL. Analysis of this protein sequence 
10 reveals the following: 

Possible site: 42 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.92 Transmembrane 30 - 46 ( 24 - 49) 

15 Final Results 

bacterial membrane Certainty=0 .4567 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC95455 GB-.AF068903 Y11D [Streptococcus pneumoniae] 
Identities = 44/99 (44%) , Positives = 71/99 (71%) 

Query: 5 KRTEATCQT^RHIKTFSRIEKAFYGAIVlTAIIMAvGIIYLQSNSLQVKQEvNQLNSKI 64 
25 ++ E Q LQ +K FSR+EKAFY +1 +T +I+A+ II++Q+ LQV+ ++ ++N++I 

Sbjct: 3 EKMEKTGQILQMQLKRFSRVEKAFYFSIAVTTLIVAISIIFMQTKLLQVQNDLTKINAQI 62 

Query: 65 NDKQTEFDNAKQEVNELSNRDRITKIAKDAGLTIQNDNI 103 
+K+TE D+AKQEVNEL +R+ +IA L + N+NI 
30 Sbjct: 63 EEKKTELDDAKQEVNELLRAERLKE IANSHDLQLNNENI 101 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2899> which encodes the amino acid 
sequence <SEQ ID 2900>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
35 >» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.79 Transmembrane 40 - 56 ( 37 - 58) 

Final Results 

bacterial membrane — Certainty=0. 33 14 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC95455 GB:AF068903 YllD [Streptococcus pneumoniae] 
45 Identities = 45/94 (47%) , Positives = 69/94 (72%) 

Query: 24 LQKRIKTFSRIEKAFYTAIIVTAITMAVSIIYLQSRKLQLQQEITSLNSHISDQKLELNN 83 

LQ ++K FSR+EKAFY +1 VT + +A+SII++Q++ LQ+Q ++T +N+ I ++K EL++ 
Sbjct: 12 LQMQLKRFSRVEKAFYFSIAVTTLIVAISIIFMCTKLLQVQNDLTKINAQIEEKKTELDD 71 



50 



Query: 84 AKQE VNELSRRDRI IDIAGKAGLSNRNNNI KKVE 117 

AKQEVNEL R +R+ +IA L N NI+ E 
Sbjct: 72 AKQEVNELLRAERLKEIANSHDLQLNNENIRIAE 105 



55 An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/108 (65%) , Positives = 87/108 (79%) , Gaps = 1/108 (0%) 
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Query: 1 MTNEKRTEAVTQTLQRHIKTFSRIEKAFYGAIVITAIIMRVGIIYLQSNSLQVKQEVNQL 60 

MTNEKRT+ VT LQ+ IKTFSRIEKAFY AI++TAI MAV IIYLQS LQ++QE+ L 
Sbjct: 11 MTNEKRTQWTNALQKRIKTFSRIEKAFYTAIIVTAITMAVSIIYLQSRKLQLQQEITSL 70 

Query: 61 NSKINDKQTEFDNAKQEVKIELSNRDRITKIAKDAGLTIQNDNIYRKVD 108 

NS I+D++ E +NAKQEVNELS RDRI IA AGL+ +N+NI +KV+ 
Sbjct: 71 NSHISDQKLELNNAKQEVNELSRRDRIIDIAGKAGLSNRNNNI-KKVE 117 

SEQ ID 2898 (GBS82) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 15 (lane 2; 2 bands). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 952 

A DNA sequence (GBSxlOlO) was identified in S.agalactiae <SEQ ID 2901> which encodes the amino 
acid sequence <SEQ ID 2902>. Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1435 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 953 

A DNA sequence (GBSxlOll) was identified in S.agalactiae <SEQ ID 2903> which encodes the amino 
acid sequence <SEQ ID 2904>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 47 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.90 Transmembrane 37 - 53 ( 30 - 60) 

Final Results 

bacterial membrane Certainty=0. 6562 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2905> which encodes the amino acid 
sequence <SEQ ID 2906>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.06 Transmembrane 33 - 49 ( 24 - 53) 

Final Results 

bacterial membrane Certainty=0. 6222 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty^O . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 480/753 (63%) , Positives = 603/753 (79%) , Gaps = 8/753 (1%) 

5 

Query: 5 KKLKKIFLDYVIHIRDRRSPQKNRERVGQNLMILTIFLFFIFIINFVIIVGTDSKFGVNL 64 

KK +K LDYV+ RDRR+P +NR RVGQN+M+LTIF+FFI FI INF+ I I +GTD KFGV+L 
Sbjct: 2 KKWQKYVLDYW--RDRRTPVENRVRVGQNMMLLTIFIFFIFI INFMIIIGTDQKFGVSL 59 

10 Query: 65 SKKUCKVYQQSMWQAKRGTIYDRNGNPIAFJ5ATTYSLYAI I SKNYTTATGQKLYVQPSQ 124 

S+ AKKVYQ+++T+QAKRGTIYDRNG IA D+TTYS+YAI+ K++ +A+ +KLYVQPSQ 
Sbjct: 60 SEGAKKvYQETVTIQAKRGTIYDRNGTAIAVDSTTYSIYAILDKSFVSASDEKLYVQPSQ 119 

Query: 125 YEKVASILENKLGMKKNLVLKQLNQKKLFQVSFGSSGSGLSYTKMADIKKTMEKSDIKGI 184 
15 YE VA IL+ LGMKK V+KQL +K LFQVSFG SGSG+SY+ M+ I+K ME + IKGI 

Sbjct: 120 YETVADILKKHLGMKKTDVIKQLKRKGLFQVSFGPSGSGISYSTMSTIQKAMEDAKIKGI 179 

Query: 185 GFSTS PGR1 YPNGI FASQF IGF - TLPQDDGDG - KKLVGNTGLEAALNKVLSGTDGKVTYE 242 
F+TSPGR+YPNG FAS+FIG +L +D G K LVG TGLEA+ +K+LSG DG +TY+ 
20 Sbjct: 180 AFTTSPGRMYPNGTFASEFIGLASLTEDKKTGVKSLVGKTGLEASFDKILSGQDGVITYQ 239 

Query: 243 KDRSGNVLLGTATTERRAVNGKDIYTTLSEPIQTVIjETQMDVFAEKTKGKFASATVVNAK 302 

KDR+G LLGT T ++A++GKDIYTTLSEPIQT LETQMDVF K+ G+ ASAT+VNAK 
Sbjct: 240 KDRNGTTLLGTGKTVKKAIDGKDIYTTLSEPIQTFLETQMDVFQAKSNGQLASATLVNAK 299 

25 

Query: 303 TGEIIATSQRPTYNPSTIiKGYDKKNLGTYNTLLYDNFFEPGSTMKVMTLASAIDSKHFNS 362 

TGEILAT+QRPTYN TLKG + N Y+ L N FEPGSTMKVMTLA+AID K FN 
Sbjct: 300 TGEIIATTQRPTYNADTLKGLEirarYKWYSAI^CGN-FEPGSTMKvMTLAAAIDDKVFNP 358 

30 Query: 363 TEVYNSAQ-YKIADAIIRDWDVNEGLSSGSYMTFPQGFAHSSNVGMVTLEQKMGRDKWLN 421 

E +++A I ADA I+DW +NEG+S+G YM + QGFA SSNVGM LEQKMG KW+N 
Sbjct: 359 NETFSNANGLTIADATIQDWS INEGI STGQYMNYAQGFAFS SNVGMTKLEQKMGNAKWMN 418 

Query: 422 YLSKFKFGYPTRFGMLHESGGLFPSDNEVTIAMSSFGQGIGVTQVQMLRAFTSISNDGVM 481 
35 YL+KF+FG+PTRFG+ E G+FPSDN VT AMS+FGQGI VTQ+QMLRAFT+ISN+G M 

Sbjct: 419 YLTKFRFGFPTRFGLKDEDAGI FPSDNIVTQAMSAFGQGI S VTQIQMLRAFTAI SNNGEM 478 

Query: 482 LQPQFISSIYDPNTGTSRTARKEWGKPVSKEAASKTRDYMVTVGTDPYYGTLYA-AGAP 540 
L+PQFIS IYDPNT + RTA KE+VGKPVSK+AAS+TR YM+ VGTDP +GTLY+ P 
40 Sbjct: 479 LEPQFISQIYDPNTASFRTANKEIVGKPVSKKAASETRQYMIGVGTDPEFGTLYSKTFGP 538 

Query: 541 VIQVGNQSVAVKSGTAQIAQEGGGGYLQ-GKNDTINSWAMVPSENPDFIMYVTIQQPEK 599 

+ I+VG+ VAVKSGTAQI E G GY G + + SWAMVP++ PDF+MYVT+ +P+ 
Sbjct: 539 IIKVGDLPVAvXSGTAQIGSEDGSGYQDGGLTNYvYSWA^IVPADKPDFLMYVTMTKPQH 598 

45 

Query: 600 FSITFWKDvWPvLEQATAMKETILKPGLNDSEHQTKYKLSKIVGENPGHVAEELRRNLV 659 

F FW+DWNPVLE+A M++T+ KP ++D+ QT YKL VG+NPG + EDRRNLV 
Sbjct: 599 FGPLFWQDWNPVLEEAYLMQDTLTKPWSDANRQTTYKLPNFVGKNPGETSSELRRNLV 658 

50 Query: 660 QPIILGNGSKySKVSKRPGANLAENEQLLVLimLTELPDmGWSKANVEQFAKWTGIKV 719 

QP++LG GSK+ KVS +PG L EN+Q+L+L+++ E+PDMYGW+K+NV+ FAKWTGI + 
Sbjct: 659 QPVVLGTGSKIKCTSHQPGQTLTENQQVIjILSDRFvEVPDMYGWTKSNVKTFAKWTGIDI 718 

Query: 720 TYKGSTSGKVRKQSIDVGKSINKIKKIKITIGD 752 
55 ++KG+ SG+V KQS+DVGKS+ KIKK+ IT+GD 

Sbjct: 719 SFKGTDSGRVMKQS VDVGKSLKKI KKMT I TLGD 751 

A related GBS gene <SEQ ID 8691> and protein <SEQ ID 8692> were also identified. Analysis of this 
protein sequence reveals the following: 

60 Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: -4.31 

GvH: Signal Score (-7.5): -7.07 
Possible site: 47 

»> Seems to have no N-terminal signal sequence 
65 ALOM program count: 1 value: -13.90 threshold: 0.0 
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INTEGRAL Likelihood =-13.90 Transmembrane 37 - 53 ( 30 - 60) 
PERIPHERAL Likelihood = 5.30 450 
modified ALOM score: 3.28 

5 *** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 6562 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00411{301 - 2556 of 2856) 

GP|677911l|emb|CAB70457.l| |A94911(1 - 752 of 752) unnamed protein product {unidentified}, 
15 homology to penicillin-binding protein 2x (S. pneumoniae) 

%Match =77.4 

%Identity = 99.7 %Similarity = 99.9 

Matches = 750 Mismatches = 1 Conservative Sub.s = 1 

20 66 96 126 156 186 216 246 276 

RIEKAFYGAIVITAIIMAVGIIYLQSNSLQVKQEVNQmSKINDKQTEFDNAKQEVNELSNRDRITKIAKDAGLTIQNDN 

306 336 366 396 426 456 486 516 

IYRKVD*SVTFFKKLKKIFLDYVIHIRDRRSPQKNRERVGQNLMILTIFLFFIFIINFVIIVGTDSKFGVNLSKEAKKVY 

25 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1| 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

OTFFKmKKIFLDYVlHIRDRRSPQKNRERVGQNLMILTIFLFFIFIINFVlIVGTDSKFGVmiSKEAKKVY 
10 20 30 40 50 60 70 

546 576 606 636 666 696 726 756 

30 QQSMTVQAKRGTI YDRNGNPIAEBATTYSLYAI I SKNYTTATGQKLYVQPSQYEKVAS I LENKLGMKKNLVLKQLNQKKL 

iiiiiiiiiiimiiiimimiiiiiiiimmiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii 

QQSMTVQAKRGTIYDRNGNPIAEDATTYSLYAIISKNinTATGQ 

90 100 110 120 130 140 150 

35 786 816 846 876 906 936 966 996 

FQVSFGSSGSGLSYTKMADIKKTMEKSDIKGIGFSTSPGRIYPNGIFASQFIGFTLPQDDGDGKKLVGNTGLFAALNKVL 

II I I I I I I I 11 I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FQVSFGSSGSGLSYTKMADIKKTMEKSDIKGIGFSTSPGRIYPNGIFASQFIGFTLPQDDGDGKIQ^VGNTGLFJViLNKVL 

170 180 190 200 210 220 230 

40 

1026 1056 1086 1116 1146 1176 1206 1236 

SGTDGKVTYEKDRSGNVIiLGTATTERRAWGKDIYTTLSEPIQTvXETQMDVFAEKTKGKFASATVVNAKTGEILATSQR 

iiiiiiMiiiiiiiiiiiiiiitiiniiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiniiiiiii 

SGTDGKVTYEKDRSGN^LGTATTERRATOGKDIYTTLSEPIQTVLETQMDVFAEKTKGKFASATWNAKTGEILATSQR 
45 250 260 270 280 290 300 310 

1266 1296 1326 1356 1386 1416 1446 1476 

PTYNPSTLKGYDKKNLGTYNTLLYDNFFEPGSTMKVMTLASAIDSKHFNSTEVYNSAQYKIADAIIRDWDVNEGLSSGSY 

imimiiiimiimmmmiimiimiMimmiiiiiimmiihimmiiiimi 

50 PTYNPSTLKGYDKKNLGTYNTLLYDNFFEPGSTMKVMTIjASAIDSKHFNSTEvYNSAQYKIADAVIRDVroVNEGLSSGSY 

330 340 350 360 370 380 390 



55 



1506 1536 1566 1596 1626 1656 1686 1716 

MTFPQ^FAHSSNVGMVTLEQKMGRDKMjNYLSKFKFGYPTRFGM 

mimmiimmiiiimmmmiiiiiiiiimmimiimiimmmiimiiim 

MTFPQj3FAHSSNVGMVTLEQKMGRDKOTiNYLSKFKFGYPTRFGMIiHESG^ 

410 420 430 440 450 460 470 



1746 1776 1806 1836 1866 1896 1926 1956 

60 TSISNDGVMLQPQFISSIYDPNTGTSRTARKEWGKPVSKEAASraOTlYMvTVGTDPYYGTLYAAGAPVIQVGNQSVAVK 

mimmiiimiimmimiiiimmiiiiiiiimmiiiiimmmiiiiimmiii 

TSISNDGVMLQPQFISSIYDPNTGTSRTARKEWGKPVSKEAASKTRDYMVTVGTDPYYGTLYAAGAPVIQVGNQSVAVK 
490 500 510 520 530 540 550 

65 1986 2016 2046 2076 2106 2136 2166 2196 

SGTAQIAQEGGGGYLQGKNDTINSWAMVPSENPDFIMYVTICMPEKFS1TFWKDVVNPVXEQATAMKETILKPGITOSE 

mimmimmiimiiimmmimmiiiiimmmiiiiiiiiimmmi urn 
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SGTAQIAQEGGGGYLQGKNDTINSWAMVPSENPDFIMYVTIQQ^^ 

570 S80 590 600 610 620 630 

2226 2256 2286 2316 2346 2376 2406 2436 

5 HQTKYKLSKIVGENPGHVAEELRRNLVQPIILGNGSKVSKVS 

liiiiiiiiiiiiiiiiiiiiiiiiiiiininiiiiiiiiiiiiiiiiiniiiiiiiiiiiiiiiiiiniiiiiii 

HQTKYKLSKIVGENPGHVAEELRRNLVQPIILGNGSKVSKVSKRPGANIAENEQ 

650 660 670 680 690 700 710 

10 2466 2496 2526 2556 2586 2616 2646 2676 

KWTGIKVTYKGSTSGKVRKQSIDVGKSINKIKKIKIT^ 

llllllllllllllllllllllllllimilllllllll 
KWTGIKVTYKGSTSGKVRKQSIDVGKSINKIKKIKTTIGD 

730 740 750 

15 SEQ ID 8692 (GBS352d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 145 (lane 15 & 16; MW 105.5kDa). It was also expressed in E.coli as a His- 
fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 145 (lane 17 & 18; MW 
80.5kDa), in Figure 182 (lane 3; MW 80kDa) and in Figure 185 (lane 4; MW 105kDa). Purified 
GBS352d-GST is shown in lane 5 of Figure 236. 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 954 

A DNA sequence (GBSxl012) was identified in S.agalactiae <SEQ ID 2907> which encodes the amino 
acid sequence <SEQ ID 2908>. Analysis of this protein sequence reveals the following: 

25 Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1950 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

35 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 955 

A DNA sequence (GBSxl013) was identified in S.agalactiae <SEQ ID 2909> which encodes the amino 
acid sequence <SEQ ID 2910>. This protein is predicted to be unnamed protein product (mraY). Analysis 
40 of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




15. 


.12 


Transmembrane 


56 


- 72 


( 


47 


- 76) 


INTEGRAL 


Likelihood 




■14. 


.70 


Transmembrane 


203 


- 219 


( 


198 


- 223) 


INTEGRAL 


Likelihood 




-6. 


.69 


Transmembrane 


318 


- 334 


( 


315 


- 335) 


INTEGRAL 


Likelihood 




-6. 


.64 


Transmembrane 


83 


- 99 


( 


79 


- 103) 


INTEGRAL 


Likelihood 




-5. 


.52 


Transmembrane 


179 


- 195 


( 


175 


- 197) 


INTEGRAL 


Likelihood 




-5. 


.31 


Transmembrane 


232 


- 248 


( 


230 


- 249) 


INTEGRAL 


Likelihood 




-3. 


.08 


Transmembrane 


119 


- 135 


( 


119 


- 137) 


INTEGRAL 


Likelihood 




-2. 


.87 


Transmembrane 


151 


- 167 


( 


147 


- 167) 


INTEGRAL 


Likelihood 




-2 


.34 


Transmembrane 


254 


- 270 


( 


254 


- 270) 



WO 02/34771 



-1048- 



PCT/GB01/04789 



Final Results 

bacterial membrane -- 
bacterial outside -■ 
bacterial cytoplasm -• 



- Certainty=0. 7 050 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



A related DNA sequence was identified in S.pyogenes <SEQ ID 291 1> which encodes the amino acid 
sequence <SEQ ID 2912>. Analysis of this protein sequence reveals the following: 



Possible site: 36 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -■ 



- Certainty=0 . 4821 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB70458 GB:A94911 unnamed protein product [unidentified] 
Identities = 244/309 (78%) , Positives = 273/309 (87%) , Gaps = 1/309 (0%) 

Query: 1 LKKIGGQQMHEDVKQHIjAKACTPTMGGTVFLLVATAVSLLVSIiF-SIKNTQSLALISGIL 59 

LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFL+VA VSL+ S+ S +N+ +L GIL 
Sbjct: 28 LKKIGGQQMHEDVKQHIAKAGTPTMGGTVFLVVftLLVSLIFSIIIiSKENSGNLGATFGIL 87 

Query: 60 SIWIYGIIGFLDDFLKIFKQINEGLTAKQKLALQLVGGLMFYFLHVSPSGISSINVFGY 119 

S+V+IYGIIGFLDDFLKIFKQINEGLT KQK++LQL+ GL+FYF+HV PSG S+IN+FG+ 
Sbjct: 88 SWLIYGIIGFLDDFLKIFKQINEGLTPKQKMSLQLIAGLIFYFVHVLPSGTSAINIFGF 147 

Query: 120 QLPLGIFYLFFVLFWWGFSNAVNLTDGIDGLASISWISLVTYGVIAYVQSQFDVLLLI 179 

L +G Y FFVLFWWGFSNAVNLTDGIDGLASISWISL+TYG+IAY Q+QFD+LL+I 
Sbjct: 148 NLEVGYLYAFFVLFWWGFSNAVNLTDGIDGLASISWISLITYGIIAYNQTQFDILLII 207 

Query: 180 GAMIGALLGFFCFNHKPAKVFMGDVGSLALGAMLAAI S IALRQEWTLLI IGIVYVLETSS 239 

MIGALLGFF FNHKPAKVFMGDVGSIALGAMLAAISIALRQEWTLL IG VYV ETSS 
Sbjct: 208 VIMIGALLGFFVFNHKPAKVFMGDVGSLALGAMliAAI S IALRQEWTLLFIGFVYVFETSS 267 

Query: 240 VMLQVSYFKYTKKKYGEGRRIFRMTPFHHHLELGGLSGKGKKWSEWQVDAFLWGVGSLAS 299 

VMLQV+YFKYTKKK G G+RIFRMTPFHHHLELGG+SGKG KWSEW+VDAFLW +G S 
Sbjct: 268 VMLQ VAYFKYTKKKTG VGKRI FRMTPFHHHLELGGVSGKGNKWSEWKVDAFLWAIGI FMS 327 

Query: 300 LLVLAILYV 308 

+ LAILY+ 
Sbjct: 328 AITLAILYL 336 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 244/309 (78%) , Positives = 273/309 (87%) , Gaps = 1/309 (0%) 

Query: 28 LKKIGGQQMHEDWQHLAKAGTPTMGGTVFLIVALLVSLIFSIILSKENSGNLGATFGIL 87, 

LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFL+VA VSL+ S+ S +N+ +L GIL 
Sbjct: 1 LKKIGGQQMHEDVKQHLAKAGTPTMGGTVFLLVATAVSLLVSLF-SIKNTQSLALISGIL 59 

Query: 88 SWLIYGIIGFLDDFLKIFKQINEGLTPKQKMSLQLIAGLIFYFVHVLPSGTSAINIFGF 147 

S+V+IYGIIGFLDDFLKIFKQINEGLT KQK++LQL+ GL+FYF+HV PSG S+IN+FG+ 
Sbjct: 60 SIWIYGIIGFLDDFLKIFKQINEGLTAKQKLALQLVGGLMFYFLHVSPSGISSINVFGY 119 
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Query: 


148 


YLEVGYLYAFFVLFWWGFSNAVNLTDGIDGLASISWISLITYGIIAYNQTQFDILLII 


207 








L +G Y FFVLrWVVGFSNAvNIjTDGIDUIiASIS U+QFD+LL+I 






Sbj ct : 


120 


QLPLGI FYLFFVLFWWGFSNAVNLTM^ 


179 


5 


Query: 


208 


VIMIGALLGFWFNHKPAKVFMGDVGSLALGAMIAAISIALRQEV^LLFIGFVYVFETSS 


267 














Sbjct: 


180 


GAMIGALLGFFCFNHKPAKVFMGDVGSIALGAMLAAISIALRQEWTLLIIGIVYVLETSS 


239 


10 


Query: 


268 


VMLQVAYFKYTKKKTGVGKRIFRMTPFHHHLELGGVSGKGNKWSEWKVDAFLWAIGIFMS 


327 






VMLQV+YFKYTKKK G G+RI FRMTPFHHHLELGG+SGKG KWSEW+VDAFLW +G S 






Sbjct : 


240 


VMLQVSYFKYTKKKYGEGRRIFRMTPFHHHLELGGLSGKGKKWSEWQVDAFLWGVGSLAS 


299 




Query: 


328 


AITIAILYL 336 










+ LAILY+ 




15 


Sbj ct : 


300 


LLVLAILYV 308 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 956 

20 A DNA sequence (GBSxl014) was identified in S.agalactiae <SEQ ID 2913> which encodes the amino 
acid sequence <SEQ ID 2914>. This protein is predicted to be autoaggregation-mediating protein (deaD). 
Analysis of this protein sequence reveals the following: 



25 



30 



35 



Possible site: 29 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3018 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14444 GB:Z99116 similar to ATP-dependent RNA helicase 
[Bacillus subtilis] 
Identities = 215/436 (49%) , Positives = 310/436 (70%) , Gaps = 5/436 (1%) 

FKDFNFKPYIQRALDELKFVDPTDVQAKLIPWRSGRDLVGESKTGSGKTHTFLLPIFEK 62 
F+ + KP+I A+ L F +PTD+Q +LIP V ++G+S+TG+GKTH +LLP+ K 

FELYELKPFIIDAVHRLGFYEPTDIQKRLIPAVLKKESVIGQSQTGTGKTHAYLLPLLNK 65 



+D + D VQWITAP+REL QIYQ +1 + E +IR ++GGTDK + I+KLK+ 



QPH+V+GTPGRI DL+K L++HKA + V+DEAD+ LDMGFL VD I +P+D+Q+L 



VFSATTP+KL+PFLKKY+ NP ++ V A I++ L+ +K RDK+ + ++ 



PYL ++F NTK AD + YL+ G+K+ +HGG+ PRERK++M Q+ +LEF YI+ATDL 



AARGIDI+GVSHVIN +P DL F+VHRVGRT R G SG A+T+Y+ +D+ + LEK+G 



421 

IF ++ GE++ DR RR R+K+ + D E+ + KK KK+KPGYKKK+ ++++ 





Query: 


3 




Sbjct: 


6 


40 


Query: 


63 




Sbjct: 


66 




Query: 


122 


45 








Sbjct: 


125 




Query: 


182 


50 


Sbj ct: 


185 




Query: 


242 




Sbjct: 


245 


55 








Query: 


302 




Sbjct: 


305 


60 


Query: 


362 



WO 02/34771 PCT/GB01/04789 

-1050- 

Sbjct: 365 IEFEYLELEKGEWKKGDDRQRRKKRKKTENEJU3-EIAHRLVKKPKKVKPGYKKKMSYEME 423 

Query: 422 EKRRKERRASNRAKGR 437 

+ ++K+RR N++K R 
Sbjct: 424 KIKKKQRR- -NQSKKR 437 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2915> which encodes the amino acid 
sequence <SEQ ID 2916>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2315 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 382/447 (85%) , Positives = 420/447 (93%) 

Query: 1 MSFKDFNFKPYIQRALDELKFVDPTDVQAKLIPWRSGRDLVGESKTGSGKTHTFLLPIF 60 

MSFKD++FK Y+Q+AL+E+ FV+PT+VQ +LIP+V SGRDL VGESKTGSGKTHTFLLPI F 
Sbjct: 1 MSFKDYHFKQYVQQALEEIGFVNPTEVQKRLIPIVNSGRDLVGESKTGSGKTHTFLLPIF 60 

Query: 61 EKLDESSDDVQWITAPSRELGTQIYQATKQIAEHSEQEIRWNYVGGTDKLRQIEKLKV 120 

EKLDE+ +VQWITAPSREL TQI+ A KQIA+H ++EIR+ NYVGGTDKLRQIEKLK 
Sbjct: 61 EKLDFAKAEVQWITAPSRELATQIFDACKQIAKHFQEEIRLANYVGGTDKLRQIEKLKD 120 

Query: 121 SQPHIVIGTPGRIYDLVKSGDLAIHKAHTFVVDEADMTIiDMGFLDTVDKIAGSLPKDVQI 180 

SQPHIVIGTPGRIYDLVKSGDLAIHKA TFWDEADMT+DMGFLDTVDKIA SLPK VQI 
Sbjct: 121 SQPHIVIGTPGRIYDLVKSGDLAIHKATTFVVDEADMTMDMGFIiDTvDKIAASLPKSVQI 180 

Query: 181 LVFSATIPQKLQPFLKKYLTNP\7MEKIKTATVIADTIDNWLLSTKGRDKNAQILELSKLM 240 

LVFSATIPQKLQPFLKKYLTNPV+E+IKT TVIADTIDNWL+STKGRDKN Q+LE+ K M 
Sbjct: 181 LVFSATIPQKLQPFLKKYLTNPVIEQIKTKTVIADTIDNWLVSTKGRDKNGQLLEILKTM 240 

Query: 241 QPYLAMIFVNTKERADELHSYLSSNGLKVAKIHGGIAPRERKRIMNQVKNLEFEYIVATD 300 

QPY+AM+FVNTKERAD+LH++L++NGLKVAKIHGGI PRERKRIMNQVK L+FEYIVATD 
Sbjct: 241 QPYMAMLFVNTKERADDLHAFLTANGLKVAKIHGGIPPRERKRIMNQVKKLDFEYIVATD 300 

Query: 301 LAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGRNGLSGTAITLYQPSDDSDIRELEKL 360 

IAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGRNG++GTAITLYQPSDDSDI+ELEK+ 
Sbjct: 301 LAARGIDIEGVSHVINDAIPQDLSFFVHRVGRTGRNGMAGTAITLYQPSDDSDIKELEKM 360 

Query: 361 GINFIPKVIKNGEFQDTYDRDRRNNREKSYQKLDTEMIGLVKKKKKKIKPGYKKKIQWKV 420 

GI F PKV+KNGEFQDTYDRDRR NREK+YQKLDTEMIGLVKKKKKK+KPGYKKKIQW V 
Sbjct: 361 GIAFTPKVLKNGEFQDTYDRDRRQNREKAYQKLDTEMIGLVKraKKKVKPGYKKKIQWAV 420 

Query: 421 DEKRRKERRASNRAKGRAERKAKKQSF 447 

DEKRRKERRA NRAKGRAERKAKKQ F 
Sbjct: 421 DEKRRKERRAENRAKGRAERKAKKQHF 447 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 957 

A DNA sequence (GBSxl015) was identified in S.agalactiae <SEQ ID 2917> which encodes the amino 
acid sequence <SEQ ID 2918>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 19 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



There is also homology to SEQ ID 2920. 

A related GBS gene <SEQ ID 8693> and protein <SEQ ID 8694> were also identified. Analysis of this 
protein sequence reveals the following: 

10 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 8.85 
GvH: Signal Score (-7.5): -1.77 

Possible site.- 19 
>» Seems to have a cleavable N-term signal seg. 
15 ALOM program count: 0 value: 8.12 threshold: 0.0 

PERIPHERAL Likelihood = 8.12 182 
modified ALOM score: -2.12 

*** Reasoning Step: 3 

20 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

EGAD 1 126750 | collagen binding protein Insert characterized 

GP|l617328|emb|CAA68052.l| |X99716 collagen binding protein Insert characterized 

ORF0018K331 - 1089 of 1410) 
30 EGAD | 126750 | 135177 (23 - 260 of 263) collagen binding protein {Lactobacillus 

reuteri}GP 1 1617328 | emb | CAA68052. 1 1 |X99716 collagen bindi 
ng protein {Lactobacillus reuteri} 
%Match =11.2 

%Identity =35.4 %Similarity =59.0 
35 Matches = 69 Mismatches = 77 Conservative Sub.s = 46 



177 207 237 267 297 327 357 387 

KTKFLKLLKSEISSFQAFLLI*NLYHLIRKYYYTDRF*SVRLVI*YFRRILMFKKIILSIATIAATASLAVSVQASEKVE 

::: : | : : | : || ||: : , | | 

40 MKFWKKALLTIAALTVGTSAGITSVSAASSAWSELVHKGE 

10 20 30 40 

417 447 477 507 537 567 597 627 

LKmTDSDTAPFTYQKDGKFKGYDVDWKAVFKGSKYKVTFKTOPro 
45 | : : :|::|:|: |: |::||: HI | | | :|:: |: :||||: |: : |||::| || | 

LTIGLEGTYSPYSYRKNNKLTGFETOLGKAVAK^GLKANW 

60 70 80 90 100 110 120 

657 687 717 747 
50 XSRSNYAWGKKGSHYKSLSDLSGKSTEVLSGVNYAQVLENWNKN-HPN 



YIKSRFALIVPTDSNIKSLKDIKGKKIIAGTGTNNANVVKKYKGNLTPNGDFASSLDMIKQGRAAGTTOSRFAWYAYSKK 
140 150 160 170 180 190 200 

55 789 819 849 879 909 939 969 

KKPIKIKYVSGTTGVTSRLKNIESGKIDFILYDAISSDYIVKDQSLNLSVSPLKGKIGNNKDGLEY 

= I II = 

NSTKGLKMIDVSSEQDPAKISALF - 

220 

60 

999 1029 1059 1089 1119 1149 1179 1209 

LLLPKDKKGKTLQKFINKRIKAnjKENGTIARLSKQYFGGDYVSNIDK*ISETISFIFLHVRVLRDRITEIESLEKESRRN 

:|l =1 II :| |:s:||: :||::||| I 
NKKDTAIQSSYNKALKELQQDGTVKKLSEKYFGADITE 
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SEQ ID 8694 (GBS8) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 2 (lane 5; MW 31kDa), Figure 63 (lane 2; MW 31.3kDa), Figure 66 (lane 2 & 3; 
5 MW 31kDa), in Figure 178 (lane 2; MW 31kDa), in Figure 179 (lane 3 & 4; MW 31kDa) and in Figure 180 
(lane 3; MW 31kDa). It was also expressed in E.coli as a GST-fusion product, with SDS-PAGE shown in 
Figure 66 (lanes 4 & 5; MW 56kDa) and in Figure 180 (lanes 4 & 5; MW 55kDa). 

GBS8-His was purified as shown in Figures 189 (lane 7), 211 (lane 3), 228 (lanes 4-5) and 230 (lanes 3-6). 
Purified GBS8-GST is shown in Figure 209, lane 6. 

10 The GBS8-His fusion product was purified (Figure 90A) and used to immunise mice (lane 2 product; 
12.9ug/mouse). The resulting antiserum was used for Western blot (Figure 90B), FACS (Figure 90C ), and 
in the in vivo passive protection assay (Table III). These tests confirm that the protein is immunoaccessible 
on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
15 vaccines or diagnostics. 

Example 958 

A DNA sequence (GBSxl016) was identified in S.agalactiae <SEQ ID 292 1> which encodes the amino 
acid sequence <SEQ ID 2922>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
20 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 991 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 959 

A DNA sequence (GBSxl017) was identified in S.agalactiae <SEQ ID 2923> which encodes the amino 
acid sequence <SEQ ID 2924>. This protein is predicted to be probable amino-acid abc transporter 
permease protein in idh-deor inter. Analysis of this protein sequence reveals the following: 

35 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-11.62 Transmembrane 50 - 66 ( 41 - 74) 
INTEGRAL Likelihood = -0.90 Transmembrane 226 - 242 ( 226 - 242) 
INTEGRAL Likelihood = -0.53 Transmembrane 80 - 96 ( 80 - 96) 

40 

Final Results 

bacterial membrane Certainty=0. 5649 (Affirmative) < suco 

bacterial outside .Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

45 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB15985 GB:Z99124 similar to amino acid ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 90/224 (40%) , Positives = 137/224 (60%) , Gaps = 10/224 (4%) 

5 Query: 28 WKAVLDAIPSILERLPITLLLTVAGALFGLILALIFAVVKINRVKILYPIQALFVSFLRG 87 

W+ ++ A P++++ LPITL + +A +F +1 LI A++ N++ +L+ + L++SF RG 
Sbjct: 6 WEFMISAFPTLIQALPITLFMAIAAMIFAIIGGLILALITKNKIPVLHQLSKLYISFFRG 65 

Query: 88 TPILVQLMLSYYGIPLFLKFmQKYGFDTOINAIPASVFAITAFAFNEAAYTSETIRAAI 147 
10 P LVQL L YYG+P +++ + A AI + AAY +E RAA+ 

Sbjct: 66 VPTLVQLFLIYYGLPQLFPEMSK MTALTAAIIGLSLKNAAYLAEIFRAAL 115 

Query: 148 LSVDQGEIEAARSLGMTSAQVYRRVIIPNAAvVATPTLINTLIGLTKGTSLAFNAGIVEM 207 
SVD G++EA S+GMT Q YRR+I+P A A P NT IGL K TSLAF G++EM 
15 Sbjct: 116 NSVDDGQLEACLSVGMTKFQAYRRIILPQAIRNAIPATGNTFIGLLKETSLAFTLGVMEM 175 

Query: 208 FAQAQIMGGSDYRYFERYISVALVYWAVSFLIEQLGNAIERKMA 251 

FAQ ++ + +YFE Y++VA+VYW ++ + L + ER M+ 
Sbjct: 176 FAQGKMYASGNLKYFETYLAVAIVYWVLTIIYSILQDLFERAMS 219 

20 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2925> which encodes the amino acid 
sequence <SEQ ID 2926>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood = -7.27 Transmembrane 80 - 96 ( 74 - 104) 

INTEGRAL Likelihood = -1.06 Transmembrane 207 - 223 { 207 - 223) 
INTEGRAL Likelihood = -0.90 Transmembrane 110 - 126 ( 110 - 126) 

Final Results 

30 bacterial membrane Certainty=0. 3909 (Affirmative) < succ> 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9167> which encodes the amino acid sequence 
35 <SEQ ID 9168>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.27 Transmembrane 50 - 66 ( 44 - 74) 
INTEGRAL Likelihood = -1.06 Transmembrane 177 - 193 ( 177 - 193) 
40 INTEGRAL Likelihood = -0.90 Transmembrane 80- 96 ( 80- 96) 

Final Results 

bacterial membrane Certainty=0 . 391 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 212/267 (79%), Positives = 238/267 (88%) 

50 Query: 1 MNQFILTGGWSWYNNLVSQVPAGKLFSWKAVLDAIPSILERLPITLLLTVAGALFGLILA 60 

M LT GW++Y+ L+S +P GKLFSW AV DAIP+I++RLPITL LT++GA FGL+LA 
Sbjct: 31 MTSVFLTSGWAFYDYLISPIPHGKLFSWHAVFDAIPNIIQRLPITLGLTLSGATFGLVLA 90 

Query: 61 LIFAVWINRVKILYPIQALFVSFLRGTPILVQLMLSYYGIPLFLKFLNQKYGFDWNINA 120 
55 LIFA+VKIN+VK+LYPIQA+FVSFLRGTPILVQLML+YYGIPLFLKFLNQKYGFDWN+NA 

Sbjct: 91 LI FALVKINKVKLLYP I QAI FVSFLRGTPILVQLMLTYYGI PLFLKFLNQKYGFDWNVNA 150 

Query: 121 IPASVFAITAFAFNEAAYTSETIRAAILSvDQGEIEAARSLGMTSAQVYRRVIIPNAAW 180 
IPAS+FAITAFAFNEAAY SETIRAAILSVD GEIEAA+SLGMTS QVYRRVI I PNA W 
60 Sbjct: 151 I PAS I FAITAFAFNEAAYASETIRAAILSVDTGEIEAAKSLGMTS VQVYRRVI I PNATW 210 



Query: 181 ATPTLINTLIGLTKGTSLAFNAGIvEMFAQAQIMGGSDYRYFERYISVALVYWAVSFLIE 240 
A PTLIN LIGLTKGTSLAFNAGIVEMFAQAQI+GGSDYRYFERYISVALVYW++S L+E 
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Sbjct: 211 AIPTLINGLIGLTKGTSLAFNAGIVEMFAQAQILGGSDYRYFERYISVALVYWSISILME 270 

Query: 241 QLGNAIERKMAI KAPRHLTDEI PGGVR 267 

Q+G IE KMAIKAP +E G +R 
Sbjct: 271 QVGRLIENKMAIKAPEQARNEKLGELR 297 

There is also homology to SEQ ID 4794. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 960 

A DNA sequence (GBSxl018) was identified in S.agalactiae <SEQ ID 2927> which encodes the amino 
acid sequence <SEQ ID 2928>. This protein is predicted to be amino acid ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3205 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00329 GB:AF008220 putative amino acid transporter [Bacillus subtilis] 
Identities = 121/247 (48%) , Positives = 176/247 (70%) 



Query: 


1 


MIKLRQLTKSFSGQKVLDKLDLDIEKGQVVALVGASGAGKSTFIiRSMNYLEEPDYGTIEI 


60 






MI+++ + K F VL ++L + RG+W ++G SG+GK+TFLR +N LE PD G I I 




Sbj ct: 


1 


MIEIKNIHKQFGIHHVLKGINLTvRKGEVVTIIGPSGSGKTTFIjRCIiNLLERPDEGIISI 


60 


Query: 


61 


DDFKVDFKSISKDDILTLRRKIAMVFQQFNLFERRTALDNVKEGLKIVKKMSDQEATRIA 


120 






D ++ + SK ++ LR++ AMVFQQ++LF +T ++NV EGL I +KM Q+A +A 




Sbj ct : 


61 


HDKVINCRFPSKKEVHWLRKQTAMVFQQYHLFAHKTVIENVMEGLTIARKMRKQDAYAVA 


120 


Query: 


121 


RDEIAKVGIADREKYYPRHLSGGQKQRVALARALAMKPDVLLLDEPTSALDPELVGEVEK 


180 






+EL KVGL D+ YP LSGGQKQRV +ARALA+ PDVLL DEPT+ALDPELVGEV + 




Sbj ct : 


121 


ENELRKVGLQDKLNAYPSQLSGGQKQRVGIARALAIHPDVLLFDEPTAALDPELVGEVLE 


180 


Query: 


181 


SIADAAKQGQTMVLVSHDMNFVYQVADKVLFLEKGRILESGTPEQLFNHPLEERTKEFFA 


240 






+ + KG TM++V+H+M F +V+D+V+F+++G I+E GTPE++F H ++RT++F 




Sbj ct : 


181 


VMLEIVKTGATMIVVTHEMEFARRVSDQWFMDEGVIVEQGTPEEVFRHTKKDRTRQFIiR 


240 


Query: 


241 


SYNKSYL 247 








+ YL 




Sbjct: 


241 


RVSPEYL 247 





A related DNA sequence was identified in S.pyogenes <SEQ ID 2929> which encodes the amino acid 
sequence <SEQ ID 2930>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1840 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 199/247 (80%) , Positives = 229/247 (92%) 
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10 



20 



50 



55 



60 



Query: 


1 


Sb j ct : 


2 


Query: 


61 


Sb j ct : 


62 


Query: 


121 


Sb j ct : 


122 


Query: 


181 


Sb j ct : 


182 


Query: 


241 


Sbjct: 


242 



MIKLRQLTKSFSGQiCVLDKLDLDIEKGQVVaLVGaSGAGKSTFLRSMNYLEEPDYGTIEI 60 
MI +R L+K+FSGQKVLD L LDIEKGQV+ALVGASGAGKSTFLRS+NYLE+PD G+I I 
MITIRNLSKTFSGQKVLDSLALDIEKGQVIALVGASGAGKSTFLRSLNYLEKPDSGSISI 6 1 



DF VDF++I+ + +L LRRKLAMVFQQFNLFERRTAL+NVKEGLK+VKK+SDQEAT++A 



+ ELAKVGLADR+ +YPRHLSGGQKQRVALARALAMKPDVLLLDEPTSALDPELVGEVEK 



15 SI DAAK GQTMVLVSHDMNFVYQVAD+VLFL++G+ILE GTPE++F HP +ERTKEFFA 



SY+K+Y+ 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 961 

25 A DNA sequence (GBSxl019) was identified in S.agalactiae <SEQ ID 2931> which encodes the amino 
acid sequence <SEQ ID 2932>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm — Certainty=0. 0831 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07290 GB:AP001519 thioredoxin reductase (NADPH) [Bacillus halodurans] 
Identities = 173/302 (57%) , Positives = 234/302 (77%) 

MYDTLI IGSGPGGMTAALYAARSNLKVGLIEQGAPGGQMNNTAEIENYPGYDHISGPELS 6 0 
40 +YD +1 G+GP GMTAA+Y +R+NL ++E+G PGGQM NT ++ENYPG+DHI GPELS 

VYDVVIAGAGPAGMTAAWTSRANLSTVMVERGVPGGQMANTEDVENYPGFDHILGPELS 66 

MKMYEPLEKFEVEHIYGIVQRVENDGDVKRVITEDESYEAKTVILATGAKNSLLGVPGEE 120 
KM+E +KF E+ YG ++ + + GD+K V ++ Y+A+ VI+ATGA+ LGVPGE+ 
45 Sbjct: 67 TKMFEHAKKFGAEYAYGDIKEIIDQGDLKLVKAGNKEYKARAVIVATGAEYKKLGVPGEK 126 



Query: 


1 


Sb j ct : 


7 


Query: 


61 


Sbjct: 


67 


Query: 


121 


Sb j ct : 


127 


Query: 


181 


Sbjct: 


187 


Query: 


241 


Sb j Ct : 


247 


Query: 


301 


Sbjct: 


307 



E + RGVSYCAVCDGAFF+ ++L+WGGGDSAVEEAV+LT+FA VTIIHRRDQLRAQK+ 



LQ RAF N+KI+F+WD WK+I G + KVS VT+E+ KTGE + GVFIY+G+ P + 



V L I ++ G+++T+ M+TS+PG++A GDVR+K LRQI TA G+G++A Q V +YI 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 293 3> which encodes the amino acid 
sequence <SEQ ID 2934>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0386 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 236/300 (78%), Positives = 273/300 (90%) 



Query: 


1 


^DTLIIGSGPGGMTAALYARRSNLKVGLIEQGAPGGQMNNTAEIENYPGYDHISGPELS 


60 






MYDTLI IGSGP GMTAALYAARSNL V +IEQGAPGGQMNNT +IENYPGYDHISGPEL+ 




Sb j ct : 


1 


^m)TLIIGSGPAGMTAALYAaRSNLSVAIIEC^PGGQMNNTFDIENYPGYDHISGPELA 


60 


Query: 


61 


MKMYEPLEKFEVEHIYGIVQRVENDGDVKRVITEDESYEAKTVILATGAKNSLLGVPGEE 


120 






MKMYEPLEKF VE+IYGIVQ++EN GD K V+TED S YEAKTVI +ATGAK +LGVPGEE 




Sbjct: 


61 


MKMYEPLEKFNVENIYGIVQKIENFGDYKCVLTEDASYEAKTVIIATGAKYRVLGVPGEE 


120 


Query: 


121 


EYTSRGVSYCAVCDGAFFRDQDLLWGGGDSAVEEAVFLTQFAKSVTIIHRRDQLRAQKV 


180 






YTSRGVSYCAVCDGAFFRDQDLLWGGGDSAVEEA++LTQFAK VT+ +HRRDQLRAQK+ 




Sb j ct : 


121 


YYTSRGVSYCAVCDGAFFRDQDLLWGGGDSAVEEAIYLTQFAKKVTWHRRDQLRAQKI 


180 


Query: 


181 


LQDRAFANEKI KFVWDS WKE I KGNE I KVSGVTVENLKTGE ISEMTFGGVFI YVGLKPHS 


240 






LQDRAFAN+K+ F+WDSWKEI+GN+IKVS V +EN+KTG++++ FGGVFIYVG+ P + 




Sbjct: 


181 


I^DRAFANDKyDFIWDSVVKEICGNDIKVSIWLIENVKTGQOTDHAFGGVFIYVGmPvT 


240 


Query: 


241 


SMVSELGITDETGWLTDTNMKTSIPGLYAIGDVRQKDLRQIATAVGEGAIAGQGVYNYI 


300 






MV +L ITD GW++TD +M+TSIPG++AIGDVRQKDLRQI TAVG+GAIAGQGVY+Y+ 




Sbjct: 


241 


GMVKDLEITDSEGWIITDDHMRTSIPGIFAIGDVRQKDLRQITTAVGDGAIAGQGVYHYL 


300 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 962 

A DNA sequence (GBSxl020) was identified in S.agalactiae <SEQ ID 2935> which encodes the amino 
acid sequence <SEQ ID 2936>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3626 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15163 GB:Z99120 similar to nicotinate 

phosphoribosyltransferase [Bacillus subtilis] 
Identities = 309/476 (64%) , Positives = 384/476 (79%) , Gaps = 2/476 (0%) 

Query: 2 YKDDSLTLHTDLYQIN^QVYFNKGIHNKRAVFEAYFRKVPFENGYAVFAGLERIVRYLE 61 

+KDDSL+LHTDLYQINM + Y+ GIH K+A+FE +FR++PFENGYAVFAGLE+ + YLE 
Sbjct: 6 FKDDSLSLHTDLYQINMAETYWRDGIHEKKAIFELFFRRLPFENGYAVFAGLEKAIEYLE 65 

Query: 62 NLSFSDSDLSYLE-ELGYPEEFLDYLKNLKMELTVKSAKEGDLVFANEPLVQIEGPLAQC 120 

N F+DSDLSYL+ ELGY E+F++YL+ L ++ S KEG+LVF NEP++++E PL + 
Sbjct: 66 NFKFTDSDLSYLQDELGYHEDFIEYLRGLSFTGSLYSMKEGELVFNNEPIMRVEAPLVEA 125 
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Query: 121 QL VETAILNI INYQTLVATKAARIRSVIEDEPLLEFGTRRAQEMDAAIWGTRAAI IGGAN 180 

QL+ETA+LNI+NYQTL+ATKAARI+ VI DE LEFGTRRA EMDAA+WG RAAt-IGG + 
Sbjct: 126 QLIETALIMIVNYQTLIATKAARIKGVIGDEVALEFGTRRMEMDAAMWGARAALIGGFS 185 

5 Query: 181 ATSNVRAGK1FNIPVSGTHAHALVQTYGDDYQAFKAYAETHKDCVFLVDTYDTLRVGVPN 240 

ATSNVRAGK FNI PVSGTHAHALVQ Y D+Y AFK YAETHKDCVFLVDTYDTLR G+PN 
Sbjct: 186 ATSNVRAGKRFNIPVSGTHAHALVQAYRDEYTAFKKYAETHKDCVFLVDTYDTLRSGMPN 245 

Query: 241 AIRVAKEMGEKINFLGTOLDSGDLAYLSKKVRQQLDDAGFPNAKIYASNDLDENTILNLK 300 
10 AIRVAKE G++INF+G+RLDSGDIAYLSKK R+ LD+AGF +AK+ AS+DLDE+TI+NLK 

Sbjct: 246 AIRVAKEFGDRINFIGIRLDSGDLAYLSKKARKMLDEAGFTDAKVIASSDLDEHTIMNLK 305 

Query: 301 MQKAKIDVWGVGTKLITAYDQPALGAVYKIVSIETDAGSMRDTIKLSNNAEKVSTPGKKQ 360 
Q A+IDVWGVGTKLITAYDQPALGAVYK+V+IE D G M DTIK+S+N EKV+TPG+K+ 
15 Sbjct: 306 AQGARIDVWGVGTKLITAYDQPALGAVYKLVAIEED-GKMVDTIKISSNPEKVTTPGRKK 364 

Query: 361 VWRITSRAKGKSEGDYITFADTDVTQLDEIEMFHPTYTYINKTVRDFDAVPLLVDIFDKG 420 

V+RI +++ SEGDYI D V + MFHP +T+I+K V +F A L IF+KG 

Sbjct: 365 VYRI INQSNHHSEGDYIALYDEQVNDQKRLRMFHPVHTFI SKFVTNFYAKDLHELIFEKG 424 



20 



Query: 421 KLVYQLPSLQE1QEYGRKEFDQLWDEYKRVLNPQDYPVDLARDVWQNKMDLIDRIR 476 

L YQ P + +IQ+Y + LW+EYKR+ P++YPVDL+ D W NKM I ++ 

Sbjct: 425 ILCYQNPEISDIQQYVQDNLSLLWEEYKRISKPEEYPVDLSEDCWSNKMQRIHEVK 480 

25 A related DNA sequence was identified in S.pyogenes <SEQ ID 2937> which encodes the amino acid 
sequence <SEQ ID 293 8>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm — Certainty=0. 3 192 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 409/484 (84%) , Positives = 446/484 (91%) 

Query: 1 MYKDDSLTLHTDLYQINMMQVYFNKGIHNKRAVFEAYFRKVPFENGYAVFAGLERIVRYL 60 
MYKDDSLTLHTDLYQINMMQVYF +GIHN+ AVFE YFRK PF NGYAVFAGL+R+V YL 
40 Sbjct: 1 ^KDDSLTLHTDLYQINMMQWFEQGIHNRHAVFEVYFRKEPFNNGYAVFAGLQRMVEYL 60 

Query: 61 ENLSFSDSDLSYLEELGYPEEFLDYLKKDKMELTVKSAKEGDLVFANEPLVQIEGPLAQC 120 

E FS++DL+YLEELGYPE FL YLK L++ELT++SAKEGDLVFANEP+VQ+EGPL QC 
Sbjct: 61 EQFQFSETDLAYLEELGYPENFLTYLKELRLELTIRSAKEGDLVFANEPIVQVEGPLGQC 120 

45 

Query: 121 QLVETAIIjNI INYQTLVATKAARIRSVIEDEPLLEFGTRRAQEMDAAIWGTRAAI IGGAN 180 

QLVETA+IjNI+N+QTL+ATKAftRIRSVIEDEPLLEFGTRRAQE+DAAIWGTRAA+IGGA+ 
Sbjct: 121 QLVETALLNIVNFQTLIATKAARIRSVIEDEPLLEFGTRRAQELDAAIWGTRAAMIGGAD 180 

50 Query: 181 ATSNVRAGKI FNI PVSGTHAHALVQTYGDDYQAFKAYAETHKDCVFLVDTYDTLRVGVPN 240 

ATSNVRAGK F+IPVSGTHAHALVQ YG+DY AF AYA+THKDCVFLVDTYDTL+VGVP 
Sbjct: 181 ATSNVRAGKRFDIPVSGTHAHALVQAYGNDYDAFMAYAKTHKDCVFLVDTYDTLKVGVPT 240 

Query: 241 AIRVAKEMGEKINFLGTOLDSGDl^YLSKKVRQQLDDAGFPNAKIYASNDLDENTIUILK 300 
55 AIRVAKEMG+KINFLGVRLDSGDLAYLSK VRQQLDDAGF AKIYASNDLDENTILNLK 

Sbjct: 241 AIRVAKEMGDKINFLGTOLDSGDIAYLSKTTOQQLDDAGFTEAKIYASNDLDENTILNLK 300 

Query: 301 MQKAKIDWGVGTKLITAYDQPALGAVYKIVSIETDAGSMRDTIKLSNNAEKVSTPGKKQ 360 
MQKAKIDVWGVGTKLITAYDQPALGAVYKIVSIE + GSMRDTIKLSNNAEKVSTPGKKQ 
60 Sbjct: 301 MQKAKIDWGVGTKLITAYDQPALGAVYK3VSIEQEDGSMRDTIKLSNNAEKVSTPGKKQ 360 

Query: 361 vWRITSRAKGKSEGDYITFADTDVTQLDEIEMFHPTYTYINKTVRDFDAVPLLVDIFDKG 420 

VWRITSR KGKSEGDYITF D +V +L EIEMFHPTYTYI KTV++FDA+PLLVDIF KG 
Sbjct: 361 VTOITSREKGKSEGDYITFTDim/NELTEIEMFHPTYTYIKKTVKEFDAIPLLVDIFVKG 420 

65 
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Query: 421 KXVYQLPSLQEIQEYGRKEFDQLVTOEYKRVIJSIPQDYPVDLARDVWQNKMDLIDRIRKEAL 480 

+LVYQLP+L EI+ Y +KEFD+LWDEYKRVIiNPQDYPVDIiRRDVWQNKM LID IRK+A 
Sbjct: 421 ELVYQLPTJ^IKAYAKKEFDKLVTOEYKRVIiNPQDYPVDIJ^WQNKmLIDNIRKDAY 480 

5 Query: 481 AKGE 484 

K E 

Sbjct: 481 GKSE 484 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
10 vaccines or diagnostics. 

Example 963 

A DNA sequence (GBSxl021) was identified in S.agalactiae <SEQ ID 2939> which encodes the amino 
acid sequence <SEQ ID 2940>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
15 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2744 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74810 GB:AE000269 NAD synthetase, prefers NH3 over glutamine 
[Escherichia coli K12] 
25 Identities = 173/274 (63%), Positives = 214/274 (77%), Gaps = 1/274 (0%) 



30 



Query: 1 MTLQDQIIKELGVKPVINPSQEIRRSVEFLKDYLLKHSFLKTYVLGISGGQDSTLAGRLA 60 

MTLQ QIIK LG KP IN +EIRRSV+FLK YL + F+K+ VLGISGGQDSTLAG+L 
Sbjct: 1 MTLQQQIIKALG3UCPQINAEEEIRRSVDFLKSYLQTYPFIKSLVIX3ISGGQDSTLAGKLC 60 

Query: 61 QLAVEELRADTG-ENYQFIAIRLPYGIQADEEDAQKALDFIKPDIALTINIKEAVDGQVR 119 

Q+A+ ELR +TG E+ QFIA+RLPYG+QADE+D Q A+ FI+PD LT+NIK AV + 
Sbjct: 61 QMAINELRLETGNESLQFIAWLPYGVQADEQDCQDAIAFIQPDRVLTVNIKGAVLASEQ 120 

35 Query: 120 ALNAAGVEITDFNKGNIKARQRMISQYAVAGQYAGAVIGTDHAAENITGFFTKFGDGGAD 179 

AL AG+E++DF +GN KAR+RM +QY++AG +G V+GTDHAAE I TGFFTK+GDGG D 
Sbjct: 121 ALREAGIELSDFVRGNEKARERMKAQYSIAGMTSGWVGTDHAAEAITGFFTKYGDGGTD 180 

Query: 180 LLPLFRBNKSQGKQLLAELGADKALYEKIPTADLEENKPGIADEIALGVTYQEIDAYIiEG 239 
40 + PL+RLNK QGKQLIA. L + LY+K PTADLE+++P + DE+ALGVTY ID YLEG 

Sbjct: 181 INPLYRLNKRQGKQLLAALACPEHLYKKAPTADLEDDRPSLPDEVALGVTYDNIDDYLEG 240 

Query: 240 KWSDKSRGIIENWWYKGQHKRHLPITIFDDFWK 273 
K V + IENW+ K +HKR PIT+FDDFWK 
45 Sbjct: 241 KNVPQQVARTIENWYLKTEHKRRPPITVFDDFWK 274 

A related DNA sequence was identified in S.pyogenes <SEQ ID 294 1> which encodes the amino acid 
sequence <SEQ ID 2942>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3482 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 213/274 (77%) , Positives = 242/274 (87%) , Gaps = 1/274 (0%) 
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Query: 


1 


MTLQDQIIKELGVKPVINPSQEIRRSVEFLKDYLLKHSFLKTYVLGISGGQDSTLAGRLA 


60 






MTLO++I1++LGVK I+P +EIR++V+FLK YL KHSFLKTYVLGISGGODSTLAG+LA 




Sbjct: 


15 


MTLQEEIIRQLGVKASIDPQEEIRKAVDFLKA.YLRKHSFLKTYVLGISGGQDSTI1AGKLA 


74 


Query: 


61 


QLAVEELRADTGEN-YQFIAIRLPYGIQADEEDaQKALDFIKPDIALTINIKEAVDGQVR 


119 






Q+A+ ELR + + YQFIA+RLPYG+QADE DAQKAL FI PD LTINIK AVDGQV 




Sbjct: 


75 


QMAIAELREEASDQAYQFIAVRLPYGVQADEADAQKALAFIAPDQTLTINIKAAVDGQVE 


134 


Query: 


120 


ALNAAGVEITDFNKGNIKARQRMISQYAVAGQYAGAVIGTDHAAENITGFFTKFGDGGAD 


179 






AL AAGVEI+DFNKGN1KARORMISOYA+AGO AGAVIGTDHAAENITGFFTKFGDGGAD 




Sb j ct : 


135 


ALQAAGVEISDFNKGNIKARQRMISQYAIAGQMAGAVIGTDHAAENITGFFTKFGDGGAD 


194 


Query: 


180 


LLPLFRLNKSQGKQLIiAELGADKALYEKIPTADLEENKPGIADEIALGVTYQEIDAYLEG 


239 






+LPLFRMK QGK LL LGAD ALYEK+PTADLE+ KPG+ADE+ALGVTYQ+ 1 D YLEG 




Sbjct: 


195 


ILPLFRIiNKRQGKALLKVIiGADAALYEBCV'PTADLEDQKPGLADEVALGVTYQDIDDYLEG 


254 


Query: 


240 


KWSDKSRGIIENWWYKGQHKRHLPITIFDDFWK 273 








K++S ++ IE WW+KGQHKRHLPITIFDDFWK 




Sb j ct : 


255 


KLI SKVAQATIEKWWHKGQHKRHLPITI FDDFWK 288 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 964 

A DNA sequence (GBSxl022) was identified in S.agalactiae <SEQ ID 2943> which encodes the amino 
acid sequence <SEQ ID 2944>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2718 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA82960 GB:Z3 0315 aminopeptidase C [Streptococcus thermophilus] 
Identities = 363/444 (81%) , Positives = 407/444 (90%) 



Query: 


1 


MSKLTQTFTDKLFADYQANTKFSAIENAVTHNGLLKSLETRQSEIENDYVFSIDLTKDEV 


60 






M+ L+ FT+KLFADY+AN K+ AIENAVTHNGLLKS+ETRQSE+END+VFSIDLTKDEV 




Sb j ct : 


1 


MTSLSTDFTEKLFADYEANAKYGAIENAVTHNGLLKSIETRQSEVENDFVFSIDLTKDEV 


60 


Query: 


61 


SNQKQSGRCWMFAALNTFRHKLISDFKLENFELSQAHTFFWDKYEKSNWFMEQIIATANQ 


120 






SNQK SGRCTMFAAIjNTFRHKLISDFKIiE+FELSQAHTFFWDKYEKSNWF+EQIIATA+Q 




Sbjct: 


61 


SNQKASGRCWMFAAIiNTFRHKLI SDFKLESFELSQAHTFFWDKYEKSNWFLEQI IATADQ 


120 


Query: 


121 


ELSSRKVKFLLDVPQQDGGQ^rWALFEKYGWPKTVYPESVSSSASRELNQYIiNKLLR 


180 






E+ SRKVKFLLD PQQDGGQWDMW+LFEKYGWPK+VYPESV+SS SREIiNQYLNKLLR 




Sbjct: 


121 


EIGSRKVKFLLDTPQQDGGQWDMWSLFEKYGWPKSVYPESVASSNSRELNQYLNKLLR 


180 


Query: 


181 


QDAQILRELIAQGADGATVQNKKEELLQEIFNFLftMNLGLPPQSFDFAYRDKDNHYQSDK 


240 






QDAQILR+LIA GAD A VQ KKEE LQEIFN+LAM LGLPP+ FDFAYRDKD++Y+S+K 




Sbjct: 


181 


QDAQILRDLIASGADQAAVQAKKEEFLQEIFNYLAMTLGLPPRQFDFAYRDKDDNYRSEK 


240 


Query: 


241 


NITPKAFYQKYVNLDLSDYVS I INAPTVDKPYGQSYTvEMLGNWGGPAVKYLNLDMKRF 


300 






ITP+AF++KYV L LSDYVS+INAPT DKPYG+SYTVEMLGNWG P+V+Y+NL M RF 




Sb j ct : 


241 


GITPRAFFEKYVGLKLSDYVSVINAPTADKPYGKSYTVEMLGNWGAPSVRYINLPMDRF 


300 


Query: 


301 


KELAIAQMKSGETVWFGSDVGQVSNRQKGILATTTYDFNSSMDIKLSQDKAGRLDYSESL 


360 






KELAIAQMK+GE+VWFGSDVGQVS+RQKGILAT YDF +SMDI +QDKAGRLDYSESL 




Sbjct: 


301 


KEIAIAQMKAGESWFGSDVGQVSDRQKGIIATNVYDFTASMDINWTQDKAGRLDYSESL 


360 


Query: 


361 


MTHAMVLTGVDLDESGQPLKWKVENSWGEKVGKDGYWASDAWMDEYTYQIVVRKELLTK 


420 
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MTHAMVLTGVDLD G+P+KWK+ENSWG+KVG+ GYFVASDAWMDEYTYQIWRK+ LT 
Sbjct: 361 MTHAMVLTGVDLDADGKPIKWKIENSWGDKVGQKGYFVASDAWMDEYTYQIWRKDFLTA 420 

Query: 421 EELEAYNAEP I TLAPWDPMGALAN 444 

EEL AY A+P LAPWDPMG+LA+ 
Sbjct: 421 EELAAYEADPQVLAPWDPMGSLAS 444 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2945> which encodes the amino acid 
sequence <SEQ ID 2946>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3002 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 369/443 (83%) , Positives = 407/443 (91%) 



yU-Ci y . 


2_ 




60 






MS LT+TFT++LFA Y+AN KFSAIENAVTHNGLLKSLETRQSE++ND+VFSIDLTKD+V 




Sbj ct : 


1 


MSALTETFTEQLFAHYEANAKFSAIENAVTHNGLLKSLETRQSEVDNDFVFSIDLTKDKV 


60 


Query: 


61 


SNQKQSGRC^FAAIxNTFRHKLISDFKLENFELSQAHTFFWDKOTKSNWFMEQIIATANQ 


120 






SNQK SGRCWMFAAIiNTFRHKIjI++FKLENFELSQAHTFFWDKYEK+NWFMEQ+IATA+Q 




Sbjct: 


61 


SNQKASGRCWMFAAIOTFRHKLITEFiaEireEIiSQA^ 


120 


Query: 


121 


ELSSRKVKFLLDVPQQDGGQWD^WVALFEKYGWPKTvYPESVSSSASREraQYLNKLLR 


180 






EL+SRKVKFLLDVPQQDGGQWDMW+LFEKYGWPK+VYPES+SSS SRELNQYLNKLLR 




Sbj ct : 


121 


ELTSRKVKFLLDVPQQDGGQVTOMWSLFEKYGVVPKSVYPESISSSNSRELNQYIjNKLLR 


180 


Query: 


181 


QDAQILRELIAQGADGATVQNKKEELLQEIFNFLAMNLGLPPQSFDFAYRDKDNHYQSDK 


240 






QDAQILR+LIA GA V+++K ELLQE I FNFLAM LGLPP+ FDFAYRDKD+HY +K 




Sbjct: 


181 


QDAQILFJDLIASGAKADQVEDRKAELLQEIFNFbAMTLGLPPRHFDFAYRDKDDHYHVEK 


240 


Query: 


241 


NITPKAFYQKYVNLDLSDYVSIINAPTVDKPYGQSYTVEMLGNWGGPAVKYLNLDMK^ 


300 






+TP+AFY K+V L LSDYVS+INAPT DKPYG+SYTVEMLGNWG V+YLNLDMKRF 




Sbj ct : 


241 


GLTPQAFYDKFVGLKLSDYVSVINAPTADKPYGKSYTVEMLGNVVGSREVRYIiNLDMKRF 


300 


Query: 


301 


KEl^IAQMKSGETWFGSDVGQVSNRQKGILATTTYDFNSSMDIKLSQDKAGRLDYSESL 


360 






KELAI QM++GE+VWFGSDVGQVS+RQKGILAT TYDF +SMDI LSQDKAGRLDYSESL 




Sbj Ct : 


301 


KELAIKQMQAGESVWFGSDVGQVSDRQKGILATNTYDFEASMDINLSQDKAGRLDYSESL 


360 


Query: 


361 


MTHAMVLTGVDLDESGQPLKWKYENSWGEKVGKDGYFVASDAWMDEYTYQIVVRKELLTK 


420 






MTHAMVLTGVDLDE+G+PLKWKVENSWGEKVG GYFVASDAWMDEYTYQI VVRKE LT 




Sbj ct : 


361 


MTHAMVLTGVDLDETGKPLKWKVENSWGEKVGDKGYFVASDAWMDEYTYQIVVRKEFLTA 


420 


Query: 


421 


EELEAYNAEPITLAPWDPMGALA 443 








+EL AY EP LAPWDPMGALA 




Sbj ct : 


421 


DELAAYEKEPQVLAPWDPMGALA 443 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 965 

A DNA sequence (GBSxl024) was identified in S.agalactiae <SEQ ID 2947> which encodes the amino 
acid sequence <SEQ ID 2948>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

A related GBS nucleic acid sequence <SEQ ID 9533> which encodes amino acid sequence <SEQ ID 9534> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF17262 GB:AF210752 penicillin-binding protein 1A 
10 [Streptococcus pneumoniae] 

Identities = 412/725 (56%) , Positives = 544/725 (74%) , Gaps = 14/725 (1%) 

Query: 4 IKKESVIKLLKYAFGIIMGFIILAIVIGGLLFAYyVSRSPKLTDQALKSVNSSLVYDGNN 63 
+ K ++++L+KY + +1 A1V+GG +F YYVS++P L++ L + SS +YD N 

15 Sbjct: 1 MNKPTlLRLIKyLSISFLSLVIAAIVLGGGVFFYYVSKAPSLSESKLVATTSSKIYDNKN 60 

Query: 64 KLIADLGSEKRESVSADSIPLNLVNAITSIEDKRFFKHRGVDIYRILGAAWHNLVSSNTQ 123 

+LIADLGSE+R + A+ IP +LV AI SIED RFF HRG+D RILGA NL S++ Q 
Sbjct: 61 QLIADLGSERRVNAQANDIPTDLVKAIVSIEDHRFFDHRGIDTIRILGAFLRNLQSNSLQ 120 

20 

Query: 124 GGSTLDQQLIKLAYFSTNKSDQTLKRKSQEVTOALQ^RKYTKEEILTFYINKVYMGNGN 183 

GGSTL QQLIKL YFST+ SDQT+ RK+QE WLA+Q+E+K TK+EILT+YINKVYM NGN 
Sbjct: 121 GGSTLTQQLIKLTYFSTSTSDQTISRKAQEAWLAIQLEQKATKQEILTYYINKVYMSNGN 180 

25 Query: 184 YGMRTTAKSYFGKDLKELSIAQIALIAGIPQAPTQYDPYKNPESAQTRRNTVLQQMYQDK 243 

YGM+T A++Y+GKDL LS+ QLALLAG+PQAP QYDPY +PE+AQ RRN VL +M 
Sbjct: 181 YGMQTAAQNYYGKDLNNLSLPQLALIAGMPQAPNQY 240 

Query: 244 NISKKEYDQAVATPOTDGLKELKQKSTYPKYMDNYLKQVT 303 
30 IS ++Y++AV TP+TDGL+ LK S YP YMDNYLK+VI++V+++TG ++ T G+ VYT 

Sbjct: 241 YISAEQYEKAVNTPITDGLQSLKSASNYPAYMDNYLKEVINQVEEETGYNLLTTGMDVYT 300 

Query: 304 NINTDAQKQLYDIYNSDTYIAYPNNELQIASTIMDATNGKVIAQLGGRHQNENISFGTNQ 363 
N++ +AQK L+DIYN+D Y+AYP++ELQ+ASTI+D +NGKVIAQLG RHQ+ N+SFG NQ 
35 Sbjct: 301 NVDQEAQKHLWDIYNTDEYVAYPDDELQVASTIVDVSNGKVIAQLGARHQSSNVSFGINQ 360 

Query: 364 SVLTDRDWGSTMKPISAYAPAIDSGVYNSTGQSLNDSVYYWPGTSTQLYDWDRQYMGWMS 423 

+V T+RDWGSTMKPI+ YAPA++ GVY+ST ++D Y +PGT T +Y+WDR Y G ++ 
Sbjct: 361 AVETNRDWGSTMKPITDYAPALEYGVYDSTATIVHDEPYNYPGTDTPVYNWDRGYFGNIT 420 

40 

Query: 424 MQTAIQQSRNVPAVlUUjEAAGLDEAKSFLEKLGIYYPEMNYSNAISSNNSSSDAKYGASS 483 

+Q A+QQSRNVPAV L GL+ AK+FL LGI YP ++YSNAISSN + SD KYGASS 
Sbjct: 421 LQYALQQSRNVPAVETLNKVGIiNRAKTFLiNGLGIDYPSLHYSNAISSNTTESDKKYGASS 480 

45 Query: 484 EKMAARYSAFANGGTYYKPQYWKIEFSDGTNDTYAASGSRAMICETTAYMiyiTDMLKTVLT 543 

EKMAAAY+AFANGGTYYKP Y++K+ FSDG+ ++ G+RAMKETTAYMMTDM+KTVL 
Sbjct: 481 EKMAAAYARFANGGTYYKPmiHKOTFSDGSEKEFSNVGTRAMKETTAYMMTDmKTVLV 540 

Query: 544 FGTGTKAAIPGVAQAGKTGTSNYTEDELAKIEATTGIYNSAVGTMAPDENFVGYTSKYTM 603 
50 +G G A +P + QAGKTGTSNYT++E+ K Y G +APDE FVGYT KY M 

Sbjct: 541 YGIGRGAYLPWLPQAGKTGTSNYTDEEIEK YIKNTGYVAPDEMFVGYTRKYAM 593 

Query: 604 AIWTGYKNRLTPLYGSQLDIATEVYRAMMSYLTGGYSA-DWTMPEGLYRSGSYLYINGTT 662 
A+WTGY NRLTPL G L +A +VYR+MM+YL+ G + DW +PEGLYR+G +++ NG 
55 Sbjct: 594 AvWTGYSNRLTPLVGDGLWAAKVYRSMMTYLSEGSNPEDWNIPEGLYRNGEFVFKNGAR 653 

Query: 663 TTGTYSSSvYKNIYQNSGQSSQSSSSTSSEKQKEDKNTANDANSSSPQVETPNNGNATTP 722 

+T +SS + S +SS SSS +S+ + + N++ +++P T + TTP 

Sbjct: 654 ST--WSSPAPQQ~-PPSTESSSSSSDSSTSQSNSTTPSTNNSTTTNPNNNTQQSN--TTP 707 



60 



Query: 723 NNSNQ 727 

+ NQ 
Sbjct: 708 DQQNQ 712 



15 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2949> which encodes the amino acid 
sequence <SEQ ID 2950>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
»> Seems to have an uncleavable N-term signal seg 
5 INTEGRAL Likelihood =-13.96 Transmembrane 19 - 35 ( 9 - 43) 

Final Results 

bacterial membrane Certainty=0 . 6583 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA88918 GB:Z49095 penicillin-binding protein la [Streptococcus pneumoniae] 
Identities = 422/712 (59%) , Positives = 536/712 (75%) , Gaps = 8/712 (1%) 

Query: 4 IKNPKILKWLKYVLSAILSLIILVIIIGGLLFTFYISSAPKLSEAQLKSTNSSLVYDGNN 63 

+ P IL+ +KY+ + LSL+I I++GG +F +Y+S AP LSE++L +T SS +YD N 
Sbjct: 1 MNKPTILRLIKYLSISFLSLVIAAIVLGGGVFFYYVSKAPSLSESKLVATTSSKIYDNKN 60 

20 Query: 64 NLIADLGSEKRENVTADSIPINLVNAITSIEDKRFFNHRGVDLYRIFGAAFHNLTSQTTQ 123 

LIADLGSE+R N A+ IP +LV AI SIED RFF+HRG+D RI GA NL S + Q 
Sbjct: 61 QLIADLGSERRVNAQANDIPTDLVKAI VSIEDHRFFDHRGIDTIRILGAFLRNLQSNSLQ 120 

Query: 124 GGSTLDQQLIKLAYFSTNESDQTLKRKAQEVWLALQMERKYTKQEILTFYINKVYMGNGN 183 
25 GGSTL QQLIKL YFST+ SDQT+ RKAQE WLA+Q+E+K TKQEILT+YINKVYM NGN 

Sbjct: 121 GGSTLTQQL I KLTYFSTSTSDQT I SRKAQEAWLAI QLEQKATKQE I LTYYINKVYMSNGN 180 

Query: 184 YGMLTAAKSYYGKDLKDLSYAQIALLAGIPQAPSQYDPYLHPEAAQNRRNVVLQQMYMEK 243 
YGM TAA+ + YYGKDL +LS QLALLAG+PQAP+QYDPY HPEAAQ+RRN+VL +M + 
30 Sbjct: 181 YGMQTAAQNYYGKDIjNNLSLPQLALLAGMPQAPNQYDPYSHPEAAQDRRNLVLSEMKNQG 240 

Query: 244 HLTKAEYBTAIATPVAEGLQSLQQRSTYPKYMDNYLKQ^IEEVKKETNKDIFTAGLKVYT 303 

+++ +YE A+ TP+ +GLQSL+ S YP YMDNYLK+VI +V++ET ++ T G+ VYT 
Sbjct: 241 YISAEQYEKAVNTPITDGLQSLKSASNYPAYMDNYLKEVINQVEEETGYNLLTTGMDVYT 300 

35 

Query: 304 NIIPDAQQTLYNIYHSGDYVYYPDQDFQVASTIVDVTNGHVIAQLGGRNQDENVSFGTNQ 363 

N+ +AQ+ L++IY+S YV YPD D QVAST+VDV+NG VIAQLG R+Q NVSFGTNQ 
Sbjct: 301 NVDQEAQKHLWDIYNSDQYVSYPDDDLQVASTVVDVSNGKVIAQLGARHQASNVSFGTNQ 360 

40 Query: 364 AVLTDRDWGSTMKPITAYAPAIESGvYTSTAQSTOTDSvYYWPGTTTQLFNWDLRYNGWMT 423 

AV T+RDWGS+MKPIT YAPA+E GVY STA +D Y +PGT T L+NWD Y G +T 
Sbjct: 361 AVBTNRDWGSSMKPITDYAPALEYGVYDSTASIVHDVPYNYPGTDTPLYNWDHVYFGNIT 420 

Query: 424 IQAAIMLSRNVPAVRALEAAGLDYARSFLSSLGINYPEMHYSNAISSNNSSSDKKYGASS 483 
45 IQ A+ SRNV AV L GLD A++FL+ LGI+YP MHY+NAISSN + S+KKYGASS 

Sbjct: 421 IQYALQQSRNVTAVETLNKVGLDRAKTFLNGLGIDYPSMHYANAISSNTTESNKKYGASS 480 

Query: 484 EKMAAAYAAFANGGIYHKPRYWKVEFSDGTSKTFDEKGKRAMKETTAYMMTDMLKTVLT 543 
EKMAAAYAAFANGGIYHKP Y+NK+ FSDG+ K F + G RAMKETTAYMMT+M+KTVLT 
50 Sbjct: 481 EKMAAAYAAFANGGIYHKPMYINKIVFSDGSEKEFSDAGTRAMKETTAYMMTEMMKTVLT 540 

Query: 544 YGTGTAAAIPGVAQAGKTGTSNYTDEELAKIGEKYGLYPDYVGTLAPDENFVGFTKRYAM 603 

YGTG A +P + QAGKTGTSNYTDEE+ K Y G +APDE FVG+T++YAM 

Sbjct: 541 YGTGRGAYLPWLPQAGKTGTSNYTDEEIEK YIKNTGYVAPDEMFVGYTRKYAM 593 

55 

Query: 604 AVWTGYKNRLTPVYGSSLEIASDVYRSMMTYLT-NGYSEDWTMPNGLYRSGGFLYLSGTY 662 

AVWTGY NRLTP+ G +A VYRSM+TYL+ + DWTMP+GLYR+G F++ +G 
Sbjct: 594 AVWTGYSNRLTPIIGDGFLVAGKVYRSMITYLSEDDQPGDWTMPDGLYRNGEFVFKNGAR 653 

60 Query: 663 ASNTDYTNSVYNNLYSNNTTTASSQTTSDDTSSSNDTSNSTNTDNNGSHPST 714 

++ + + S+++++ SS + S+ T+ S + S +TN +NN +T 

Sbjct: 654 STWSSPAPQQPPSTESSSSSSDSSTSQSNSTTPSTNNSTTTNPNNNTQQSNT 705 



An alignment of the GAS and GBS proteins is shown below. 
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Identities = 521/729 (71%), Positives = 621/729 (84%), Gaps = 10/729 (1%) 

Query: 1 MITIKKESVIKLLKYAFGIIMGFIILAIVIGGLLFAYYVSRSPKLTDQALKSVNSSLvYD 60 

+ITIK ++K LKY 1+ IIL I+IGGLLF +Y+S +PKL++ LKS NSSLVYD 
Sbjct: 1 VITIKNPKILKWLKYVLSAILSLIILVIIIGGLLFTFYISSAPKLSEAQLKSTNSSLVYD 60 

Query: 61 GNNKLIADLGSEKRESVSADSIPLNLVl^ITSIEDKRFFKHRGVXIIYRILGAAWHNLVSS 120 

GNN LIADLGSEKRE+V+ADSIP+NLVNAITSIEDKRFF HRGVD+YRI GAA+HNL S 
Sbjct: 61 GNMILIADLGSEKRE1WTADSIPINI.VNAITSIEDKRFFNHRGVDLYRIFGAAFHNLTSQ 120 

Query: 121 OTQGGSTLDQQLIKI^YFSTMKSDQTnKRKSQEvWIALQMERKYTKEEILTFYlNKVYMG 180 

TG^STLDQQLIKIAYFSTO+SDQTLKRK+QEvmiALQMERKYTK+EILTFYINKVYMG 
Sbjct: 121 TTQGGSTLDQQLiroiAYFSTNESDQTLKRKAQEVWLALQMERKYTKQEILTFYINKVYMG 180 

15 Query: 181 NGNYGMRTTAKSYFGKDLKELSIAQLALLAGIPQAPTQYDPYKNPESAQTRRNTVLQQMY 240 

NGNYGM T AKSY+GKDLK+LS AQLALLAGIPQAP+QYDPY +PE+AQ RRN VLQQMY 
Sbjct: 181 NGNYGMLTAAKSYYGKDLKDLSYAQLALLAGIPQAPSQYDPYLHPFJAQNRRNWLQQMY 240 

Query: 241 QDKNISKKEYDQAVATPVTDGLKELKQKSTYPKYMDNYLKQVISEVKQKTGKDIFTAGLK 300 
20 +K+++K EY+ A+ATPV +GL+ L+Q+STYPKYMDNYLKQVI EVK++T KDIFTAGLK 

Sbjct: 241 MEKHLTKAEYETAIATPVAEGLQSLQQRSTYPKYMDNYLKQVIEEVKKETMKDIFTAGLK 300 

Query: 301 VYTNINTDAQKQLYDIYNSDTYIAYPNNELQIAST1MDATNGKVIAQLGGRHQNENISFG 360 
VYTNI DAQ+ LY+IY+S Y+ YP+ + Q+ASTI+D TNG VIAQLGGR+Q+EN+SFG 
25 Sbjct: 301 VYTNIIPDAQQTLYNIYHSGDYVYYPDQDFQVASTIVDVTNGHVIAQLGGRNQDENVSFG 360 

Query: 361 TNQSVLTDRDWGSTMKPISAYAPAIDSGVYNSTGQSLiroSVYYWPGTSTQLYDWDRQYMG 420 

TNQ+VLTDRDWGSTMKPI +AYAPAI + SGVY ST QS NDSVYYWPGT+TQL++WD +Y G 
Sbjct: 361 TNQAVLTDRDWGSTMKPITAYAPAIESGVYTSTAQSTNDSVYYWPGTTTQLFNWDLRYNG 420 

30 

Query: 421 WMSMQTAIQQSRNVPAVRALEAftGLDEAKSFLEKLGIYYPEMNYSNAI S SNNS SSDAKYG 480 

WM++Q AI SRNVPAVRALEAftGLD A+SFL LGI YPEM+YSNAISSNNSSSD KYG 
Sbjct: 421 WMTIQAAIMLSRWPAVRALEaMLDYARSFLSSLGINYPEMHYSNAISSKWSSSDKKYG 480 

35 Query: 481 ASSEKMAAAYSAFANGGTYyKPQYVNKIEFSDGTNDTYAASGSRAMKETTACTMTDMLKT 540 

ASSEKMAAAY+AFANGG Y+KP+YVNK+EFSDGT+ T+ G RAMKETTAYMMTDMLKT 
Sbjct: 481 ASSEKMAAAYAAFANGGIYHKPRYVNKVEFSDGTSKTFDEKGKRAMKETTAYMMTDMLKT 540 

Query: 541 VLTFGTGTKAAIPGVAQAGKTGTSNYTEDEIAKIEATTGIYNSAVGTMAPDENFVGYTSK 600 
40 VLT+GTGT AAI PGVAQAGKTGTSNYT+ +ELAKI G+Y VGT+APDENFVG+T + 

Sbjct: 541 VLTYGTGTAAAIPGVAQAGKTGTSNYTDEELAKIGEKYGLYPDYVGTLAPDENFVGFTKR 600 

Query: 601 YTMAIWTGYKNRLTPLYGSQLDIATEVYRAMMSYLTGGYSADWTMPEGLYRSGSYLYING 660 
Y MA+WTGYKNRLTP+YGS L+IA++VYR+MM+YLT GYS DWTMP GLYRSG +LY++G 
45 Sbjct: 601 YAMAVWTGYKNRLTPVYGSSLEIASDVYRSMMTYLTNGYSEDWTMPNGLYRSGGFLYLSG 660 

Query: 661 TTTTGT-YSSSVYKNIYQNSGQSSQSSSSTSSEKQKEDKNTANDANSSSPQVETPNNGNA 719 

T + T Y++SVY N+Y N ++++ SS+ +D +++ND ++S+ T NNG+ 
Sbjct: 661 TYASNTDYTNSVYNNLYSN NTTTASSQTTSDDTSSSNDTSNST NTDNNGSH 711 

50 

Query: 720 TTPNNSNQT 728 

+ ++ T 
Sbjct: 712 PSTDDKKTT 720 

55 A related GBS gene <SEQ ID 8695> and protein <SEQ ID 8696> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: 6.55 
GvH: Signal Score (-7.5): -1.98 
60 Possible site: 36 

»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 4.03 threshold: 0.0 
PERIPHERAL Likelihood = 4.03 201 
modified ALOM score: -1.31 



65 



*** Reasoning Step: 3 
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Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < 

The protein has homology with the following sequences in the databases: 

57.5/76.2% over 712aa 

Streptococcus 

pneumoniae 

GP | 6563351 | penicillin-binding protein 1A Insert characterized 
ORF00399(310 - 2484 of 2850) 

GP|656335l|gb|AAF17262.l|AF210752_l|AF210752(l - 713 of 719) penicillin-binding protein 1A 
{Streptococcus pneumoniae} 
%Match =43.8 

%Identity = 57.5 %Similarity = 76.2 

Matches = 412 Mismatches = 166 Conservative Sub.s = 134 

237 267 297 327 357 387 417 447 

LI ISEKMDFS *RRVPFIiKSLT* ILLKKNY*AVITIKKESVIKLLKKAFGI IMGFI ILAIVIGGLLFAYYVSRSPKLTDQA 

: | ::::|:|| : ::| |||:|| :| ||||::| |:= 

MNKPTILRLIKYLSISFLSLVIAAI VLGGGVFFYYVSKAPSLSESK 
10 20 30 40 

477 507 537 567 597 627 657 687 

LKSWSSLvYDGNNKLIADLGSEKRESVSADSIPmLvNAITSIEDKRFFKHRGVDIYRILGAAWHNLVSSNTQGGSTLD 

i : ii hi 1 = 11111111 = 1 = i= ii =n ii mi ill 111 = 1 inn ii i = = linn 

LVATTSSKIYDNKNQLIADLGSERRVNAQANDIPTDLVKAIVSIEDHRFFDHRGIDTIRILGAFLRNLQSNSLQGGSTLT 
60 70 80 90 100 110 120 

717 747 777 807 837 867 897 927 

QQLIKIAYFSTNKSDQTLKRKSQEW1ALQMERKYTKEEILTFYINKVYMGNGNYGMRTTAKSYFGKDLKELS 

mm iiii= iiii= ii = ii 111 = 1 = 1 = 1 ii = iiii = iimii mum -mm ii= iiiii 

QQLIKLTYFSTSTSDC/TISRKRQEAWLAIQLEQKATKQEIL^ 

140 150 160 170 180 190 200 

957 987 1017 1047 1077 1107 1137 1167 

AGIPQAPTQYDPYKNPESAQTRRimTLQQMYQDKNISKKEYD 

11=1111 lllll =11=11 III II =1 11 ==l==ll 11=1111= II I II lllllll=ll==l=== 

AGMPOAPNQYDPYSHPEAAQDRRNLVLSEMKNQGYISAEQYEKAVNTPITDGLQSLKSASNYPAYMDNYLKEVINQVEEE 

220 230 240 250 260 270 280 



< suco 

suco 

suco 



1197 1227 1257 1287 1317 1347 1377 1407 

TGKDIFTAGLKVYTNINTDAQKQljYDIYNSDTYIAYPNNELQIASTIMDATNGKVIAQLGGRHQNENISFGTNQSVLTDR 

II ===l 1= 1111== =111=1=1111=1 l=lll==lll=llll=l =111111111 111= hill Ihl 1=1 

TGYNLLTTGMDWTNVDQFAQKHL^ 

300 310 320 330 340 350 360 

1437 1467 1497 1527 1557 1587 1617 1647 

DWGSTMKPISAYAPAIDSGVYNSTGQSLNDSVYYWPGTSTQLYDWDRQYMGWMSMQTAIQQSRNVPAVRALEAAGLDEAK 

llllllllh 1111 = = llhll = = l I =111 I =1 = 111 I I = = = l hlllllllll I 11= II 
DWGSTMKPITDYAPALEYGVYDSTATI VHDEPYNYPGTDTPVYNWD^ 

380 390 400 410 420 430 440 

1677 1707 1737 1767 1797 1827 1857 1887 

SFLEKLGIYYPEMNYSIffilSSNNSSSDAKYGASSEKMAAAYSAFANGGTYYKPQ 

= 11 III II = = 11111111 = II llllllllllllhlllllllllll l = = l= 1111= == hllllll 
TFLNGLGIDYPSLHYSNAISSNTTESDKKYGASSEKMAAAYAAFANGGTYYKPMYIHKWFSDGSEKEFSNVGTRAMKET 

460 470 480 490 500 510 520 

1917 1947 1977 2007 2037 2067 2097 2127 

TAYMMTDMLKTVLTFGTGTKAAIPGVAQAGKTGTSNYTEDELAKIEATTGIYNSAVGTMAPDENFVGYTSKYTMAIWTGY 

mmihim =ii i=i = miniumm i i i =mi inn n 11 = 1111 

TAYMMTDMMKTVLVYGIGRGAYLPWLPQAGKTGTSNYTDEEIEK YIKNTGYVAPDEMFVGYTRKYAMAVWTGY 

540 550 560 570 580 590 
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2157 2187 2214 2244 2274 2304 2334 2364 

KNRLTPLYGSQLDIATEVYRAMMSYLT-GGYSADWTMPEGLYRSGSYLYINGTTTTGTYSSSVYKNIYQNSGQSSQSSSS 

I I I I I I I I =1 I II =111111 = 1 = = = II =1 =11 = I =11 III 

SNRLTPLVGDGLWAAKOTRSMMTYLSEGSNPEDmiPEGLYTOraEFVFKNGARST--WSSPAPQQ--PPSTESSSSSSD 

5 610 620 630 640 650 660 670 

2394 2424 2454 2484 2514 2544 2574 2604 

TSSEKQKEDKOTANDANSSSPQWTPNNGNATTPNNSNQTVPGT^^ 

=1= = = l== :::| I = 111= II 

1 0 SSTSQSNSTTPSTNNSTTTNPNNNTQQS - -NTTPDQQNQNPQPAQP 

690 700 710 

SEQ ID 8696 (GBS146) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 4; MW 82kDa), in Figure 168 (lane 11-13; MW 96.5kDa) and in Figure 
15 238 (lane 8; MW 96.5kDa). It was also expressed in E.coli as a GST-fusion product. SDS-PAGE analysis 
of total cell extract is shown in Figure 49 (lane 2; MW 107kDa). 

Purified Thio-GBS146-His is shown in Figure 244, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 966 

A DNA sequence (GBSxl025) was identified in S.agalactiae <SEQ ID 2951> which encodes the amino 
acid sequence <SEQ ID 2952>. Analysis of this protein sequence reveals the following: 



25 



30 



50 



Possible site: 37 

>>> Seems to have no N-terminal signal sequence 



Pinal Results 

bacterial cytoplasm Certainty=0. 3647 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA26957 GB:M90528 ORF [Streptococcus oralis] 
Identities = 143/196 (72%), Positives = 165/196 (83%), Gaps = 1/196 (0%) 

35 Query: 1 ^1VNYPHQLIRKTTVTKSKKKKIDFANRGMSFEAAINATNDYYLSHELAVIHKKPTPVQIV 60 

MVNYPH++ + + K +FANRGMSFE INATNDYYLSH LAVIHKKPTP+QIV 

Sbjct: 1 MVNYPHKISSQKEQAPPSQTK-NFANRGMSFEKMINATNDYYLSHGLAVIHKKPTPIQIV 59 

Query: 61 KVDYPKRSRAKIVEAYFRQASTTDYSGVYKGYYIDFEAKETRQKTAMPMKNFHAHQIEHM 120 
40 +VDYP+RSRAKIVEAYFRQASTTDYSGVY GYYIDFEAKETRQK A+PMKNFH HQI+HM 

Sbjct: 60 RVDYPQRSRAKIVEAYFRQASTTDYSGVYDGYYIDFEAKETRQKHAIPMKNFHHHQIQHM 119 

Query: 121 ANVLQQKGICFVLLHFSTLKETYLLPANELISFYQIDKGNKSMPIDYIRKNGFFVKESAF 180 
VL Q+GICFVLLHF++ +ETYLLPA +LI FY DKG KSMP+ YIR+NG+ ++ AF 
45 Sbjct: 120 EQVLAQRGICFVLLHFASQQETYLLPAVDLIRFYHQDKGQKSMPLGYIRENGYRIELGAF 179 

Query: 181 PQVPYLDI IEEKLLGG 196 

PQ+PYLDII+E LLGG 
Sbjct: 180 PQIPYLDI IKEHLLGG 195 



A related DNA sequence was identified in S. pyogenes <SEQ ID 295 3> which encodes the amino acid 
sequence <SEQ ID 2954>. Analysis of this protein sequence reveals the following: 



55 



Possible site: 37 

>>> Seems to have no N-terminal signal sequence 
Final Results 





Query: 


1 


10 


Sbjct: 


1 




Query: 


61 


15 


Sbjct: 


61 




Query: 


121 




Sbjct: 


121 


20 


Query: 


181 




Sbjct: 


181 
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bacterial cytoplasm Certainty^ 0 . 5030 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 166/199 (83%) , Positives = 177/199 (88%) 

MVNYPHQLIRKTTVTKSKKKKIDFANRGMSFEaAINATNDYYLSHEIAVIHKKPTPVQIV 6 0 
MVNYPH LIR+ + K+ K+DFANRGMSFEAAINATNDYYLS ++AVIHKKPTPVQIV 
MVNYPHNLIRQKVSSVQKQNKVDFANRGMSFEAAINATODYYLSRQIAVIHKKPTPVQIV 6 0 

KVDYPKRSRAKIVEAYFRQASTTDYSGVYKGYYIDFEAKETRQKTAMPMKNFHAHQIEHM 120 
KVDYPKRSRAKIVEAYFRQASTTDY GVYKG+Y+DFEAKETRQKTAMPMKNFH HQIEHM 
KVDYPKRSRAKI VEAYFRQASTTDYCGVYKGHYVDFEAKETRQKTAMPMKNFHLHQIEHM 120 

ANVLQQKGICFVLLHFSTLKETYLLPANELISFYQIDKGNKSMPIDYIRKNGFFVKESAF 180 
A VL QKGICFVLLHFSTLKETY LPA LISFYQID G+KSMPIDYIRKNGF V AF 



PQVPYL+IIE+ LGGDYN 
PQVPYLNI IEQNFLGGDYN 199 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
25 vaccines or diagnostics. 

Example 967 

A DNA sequence (GBSxl026) was identified in S.agalactiae <SEQ ID 2955> which encodes the amino 
acid sequence <SEQ ID 2956>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3227 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14136 GB:Z99115 similar to hypothetical proteins from B. subtilis [Bacillus 
subtilis] 

40 Identities = 74/174 (42%) , Positives = 97/174 (55%) , Gaps = 6/174 (3%) 

Query: 5 ILVTGYKNFELGIFQDKDPRITIIKKAIDKDFRRFLENGADWFIFMGNLGFEYWALEVAL 64 

+ +TGYK FELGIF+ D + IKKAI FL+ G +W + G LG E WA E A 

Sbjct: 4 LAITGYKPFELGIFKQDDKALYYIKKAIKNRLIAFLDEGLEWILISGQLGVELWAAEAAY 63 

Query: 65 DLQKEY-DFQIATIFTFENHGQNWNEANKAKIj-ALFKQVDF-VKYTFPSYENPGQFKQYN 121 

DLQ+EY D ++A I F +NW E NK + A+ Q D+ T YE+P QFKQ N 
Sbjct: 64 DLQEEYPDLKVAVITPFYEQEKNWKEPNKEQYEAVLAQADYEASLTHRPYESPLQFKQKN 123 

50 Query: 122 HFLINNTQGAYLFYDSENETNLKFLLEMMEKK EAYDISFIVTFDRLNEIYEE 172 

FI++G LYDEE+ K++L EK+ + Y I F+T D L EE 
Sbjct: 124 QFFIDKSDGLLLLYDPEKEGSPKYMLGTAEKRREQDGYPIYFITMDDLRVTVEE 177 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2957> which encodes the amino acid 
55 sequence <SEQ ID 2958>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 



45 



Final Results 
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bacterial cytoplasm 
bacterial membrane 
bacterial outside 



Certainty=0. 3041 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 102/167 (61%) , Positives = 127/167 (75%) 

Query: 3 STILVTGYKNFELGIFQDKDPRITIIKKAIDKDFRRFLENGADWFIFMGNLGFEYWALEV 62 

+ IL+TGY++FE+GIF KDPR++IIK+AI KD +LENG DWFIF GNLGFE WALEV 
Sbjct: 2 TAILITGYRSFEIGIFDHKDPRVSIIKQAIRKDLIGYLENGVDWFIFTGNLGFEQWALEV 61 

Query: 63 ALDLQKEYDFQIATIFTFENHGQNWNEANKAKLALFKQVDFVKYTFPSYENPGQFKQYNH 122 

A +L++EY Q1ATIF FE HG WNE NK L+ F+ VDFVKY FP+YE P QF QY 
Sbjct: 62 ANELKEEYPLQIATIFLFETHGDRMIEKNKEWLSQFRAVDFVKYYFPNYEQPTQFSQYYQ 121 

Query: 123 FLINNTQGAYLFYDSENETNLKFLLEMMEKKEAYDISFLTFDRIjNEI 169 

FL+ T+GAY+FYD+ENETNLK+ L+ + Y + LTFDRLN++ 
Sbjct: 122 FLLEKTEGAYVFYDTENETNLKYFLKKAKDMPHYQLLLLTFDRLNDM 168 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 968 

A DNA sequence (GBSxl027) was identified in S.agalactiae <SEQ ID 2959> which encodes the amino 
acid sequence <SEQ ID 2960>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 969 

A DNA sequence (GBSxl028) was identified in S.agalactiae <SEQ ID 2961> which encodes the amino 
acid sequence <SEQ ID 2962>. This protein is predicted to be cell division protein DivIVA. Analysis of this 
protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9535> which encodes amino acid sequence <SEQ ID 9536> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



Final Results 



bacterial cytoplasm — Certainty=0 . 5188 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm Certainty=0. 2736 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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>GP:CAB14135 GB:Z99115 ypsB [Bacillus subtilis] 
Identities = 46/102 (45%) , Positives = 69/102 (67%) , Gaps = 14/102 (13%) 

Query: 14 SPKDIFEQDFKVSMRGYDKKEVDVFLDDVIKDYENYLEQIEKLQMENRRLQQALDKKESE 73 

S K+I E++FK +RGY +++VD FLD +IKDYE + ++IE+LQ EN +L++ L+ E 
Sbjct: 9 SAKEILEKEFKTGVRGYKQEDVDKFLDMIIKDYETFHQEIEELQQENLQLKKQLE E 64 

Query: 74 ASNVRNSGTAMYNQKPIAQSATNFDILKRISRLEKEVFGRQI 115 

AS ++P+ + TNFDILKR+S LEK VFG ++ 

Sbjct: 65 AS KKQPVQSNTTNFDILKRLSNLEKHVFGSKL 96 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2963> which encodes the amino acid 
sequence <SEQ ID 2964>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .4466 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 71/112 (63%) , Positives = 85/112 (75%) , Gaps = 6/112 (5%) 

Query: 8 MASIIYSPKDIFEQDFKVSMRGYDKKEVDVFLDDVIKDYENYLEQIEKLQMENRRLQQAL 67 

M SIIYSPKDIFEQ+FK SMRG+DKKEVD FLD+VTKDYEN+ QIE L+ EN +AL 
Sbjct: 1 MTSIIYSPKDIFEQEFKTSMRGFDKKEVDEFLDNVIKDYENFNAQIEALKAEN EAL 56 

Query: 68 DKKESEASNVRNSGTAMYNQKP- - IAQSATNFDILKRISRLEKEVFGRQIRE 117 

K + +A N ++ +P +AQSATNFDILKRIS+LEKEVFG+QI E 

Sbjct: 57 KKAKFQARNWSATVQQPVPQPTRVAQSATNFDILKRISKLEKEVFGKQIIE 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 970 

A DNA sequence (GBSxl029) was identified in S.agalactiae <SEQ ID 2965> which encodes the amino 
acid sequence <SEQ ID 2966>. Analysis of this protein sequence reveals the following: 

Possible site: 55 

»> Seems to have no N-terminal signal sequence (or aa 1-19) 



Final Results 

bacterial cytoplasm Certainty=0. 0655 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14134 GB:Z99115 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 204/382 (53%), Positives = 274/382 (71%), Gaps = 3/382 (0%) 

Query: 3 ESFKLIATAAAGLEAIVGREIRNLGIDCQVENGRVRFHGDIKTIIETNLWLRAADRIKII 62 

+ + LIATA G+EA+V +E+R+LG +C+V+NG+V F GD I NLWLR ADRIK+ 
Sbjct: 2 KKYTLIATAPMGIEAWAKEWDLGYECKATONGKVIFEGDAIAICRANLWLRTADRIKVQ 61 

Query: 63 VGEFPAPTFEELFQGWGLDWENYLPLGAKFPXAKAKCVKSKLHNEPSVQAISKKAVAKK 122 

V FA TF+ELF+ ++W +++P KFP+ K VKS L + P Q I KKA+ +K 
Sbjct: 62 VASFKAKTFDELFEKTKAINWRSFIPENGKFPVT-GKSVKSTLASVPDCQRIVKKAIVEK 120 



Query: 123 LQKVFHRPEGVPLQENGAEFKIEVSILKDKATVMIDTTGSSLFKRGYRAEKGGAPIKENM 182 
L K+ ++E GAE+K+E+S+LKD+A + +D++G+ B KRGYR ++GGAPIKE + 
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Sbjct: 121 L-KLQSGKANDWIEETGAEYKVEISIiKDQALITLDSSGTGLHKRGYRVDQGGAPIKETL 179 

Query: 183 iyU^IIQLSNWFPDKPLIDPTCGSGTFCIEAAMIGMNIAPGFNRDFAFEAWPWVDQSQVQK 242 

AAA++QL+NW PD+P +DP CGSGT IEAA+IG NIAPGFNRDF E W W+ + K 
Sbjct: 180 AAALVQLTNTOPDRPFVDPFCGSGTIAIFJ^IGQNIAPGFNRDFVSEDWEWIGKDLWNK 239 

Query: 243 VRDEAESKANYDIDLDISGFDLDGRMVEIARKNAEEAGLGDVIKLKQMRLQDLKTDKING 302 

REE KANYD L I D+D RMV+ 1 A+ +NAEEAGLGD+ 1 + KQM+++D T+ G 
Sbjct: 240 ARLEVEEKANYDQPLTIFASDIDHRMVQIAKENAEEAGLGDLIQFKQMQVKDFTTNLEFG 299 

Query: 303 VIISiNrPPYGERLLDDKAVDILYMEMGQTFAPLKTWSKFILTSDEGFEKKYGSQADKKRKL 362 

VI + NPPYGERL + KAV+ +Y EMGQ F PL TWS ++LTS+E FE+ YG +A KKRKL 
Sbjct: 300 VIVGNPPYGERLGEKKAVEQMYKEMGQAFEPLDTWSVYMLTSIffiNFEEAYGRKATKKRKL 359 

15 Query: 363 YNGTLKVDLYQYYGERVRRQVK 384 

+NG +K D YQY+ +VR Q K 
Sbjct: 360 FNGFIKTDYYQYW- SKVRPQRK 380 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2967> which encodes the amino acid 
20 sequence <SEQ ID 2968>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

>>> Seems to have no N- terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 0324 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 317/383 (82%) , Positives = 354/383 (91%) 

Query: 1 MKESFKLIATAAaGLEAIVGREIRNLGIDCQVENGRVRFHGDIKTIIETNLWr.RAaDRIK 60 

MKE+F+L+ATAAAGLEA+VG+E+R LG DCQVENG+V F GD++ I++TNLWLRAADRIK 
Sbjct: 1 MKETFRLVATAAAGLFAWGKEVRALGFDCQVENGKVYFEGDVEAIVlCrNLWLRAADRIK 60 

35 

Query: 61 1 1 VGEFPAPTFEELFQGVYGLDWENYIjPIXSAKFPIAKAKCVKSKLHNEPSVQAI SKKAVA 120 

IIVG+FPA TFEELFQGV+ LDWENYLPLGAKFPI+KAKCVKSKLHNEPSVQAI+KKAV 
Sbjct: 61 IIVGQFPARTFEELFQGVFALDWENYLPLGAKFPISKAKCVKSKLHNEPSVQAITKKAW 120 

40 Query: 121 KKLQKVFHRPEGVPLQENGAEFKIEVSILKDKATVMIDTTGSSLFKRGYRAEKGGAPIKE 180 

KKLQK FHRPEGVPLQE G+ F IEVSILKD+AT+MIDTTGSSLFKRGYR +KGGAPIKE 
Sbjct: 121 KKLQKHFHRPEGVPLQEVGSTFNIEVSILKDQATIMIDTTGSSLFKRGYRVQKGGAPIKE 180 

Query: 181 NMAAailQLSNWFPDKPLIDPTCGSGTFCIEAAMIGMNIAPGFNRDFAFEAWPWVDQSQV 240 
45 NMAAAI+ LSNWFPDKPL+DPTCGSGTFCIEAAMIGMNIAPGFNR FAFE W WVD+ V 

Sbjct: 181 OT5AAAILALSNWFPDKPLVDPTCGSGTFCIEAAMIGMNIAPGFNRSFAFEEWSWVDKDIW 240 

Query: 241 QKVRDEAESKANYDIDLDISGFDLDGRMVEIARKNAEEAGLGDVIKLKQMRLQDLKTDKI 300 
Q+VRD+AE +ANY+I+LDISGFD+DGRM+EIA+ NAEEAGL DVI KQMRLQD +TDK+ 
50 Sbjct: 241 QQVRDDAEQEANYEIELDISGFDIDGRMIEIAKSNAEEAGLSDVITFKQMRLQDFRTDKV 300 

Query: 301 NGVIISNPPYGERLLDDKAVDILYNEMGQTFAPLKTWSKFILTSDEGFEKKYGSQADKKR 360 

NGV+ISNPPYGERLLDDKAVDILYNEMGQTFAPLKTWSKFILTSDE FE KYG +ADKKR 
Sbjct: 301 NGWISNPPYGERLLDDKAVDILYNEMGQTFAPLKTWSKFILTSDELFELKYGQKADKKR 360 



55 



Query: 361 ICLYNGTLKVDLYQYYGERVRRQV 383 

KLYNGTLKVDLYQ+YGERV+R + 
Sbjct: 361 KLYNGTLKVDLYQFYGERVKRHL 383 
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SEQ ID 2966 (GBS255) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 7; MW 44kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 4; MW 69kDa). 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 971 

A DNA sequence (GBSxl030) was identified in S.agalactiae <SEQ ID 2969> which encodes the amino 
5 acid sequence <SEQ ID 2970>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-15.02 Transmembrane 171 - 187 ( 167 - 193) 

10 Final Results 

bacterial membrane Certainty=0 . 7007 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD16120 GB.-AF094508 dentin phosphoryn [Homo sapiens] 
Identities = 71/398 (17%), Positives = 152/398 (37%), Gaps = 16/398 (4%) 

20 Query: 16 TDGLEFKDAK-EMTVEEAWKDSEIKAGITEEDSILDKYIKQHRDEVASQKFETKSSDFA 74 

+D + D+K + + E+ DS+ K+ ++ +S D S S 

Sbjct: 152 SDSSDSSDSKSDSSKSESDSSDSDSKSDSSDSNSSDSSDNSDSSDSSNSSNSSDSSDSSD 211 

Query: 75 NLDTASLDDFIKKQREELSAMLAAEELSKKLDNSVSQEQDTEANAVSPKEESSQEQENSV 134 
25 + D++S D + S + S+ D+S S + D+ ++ S SS ++ 

Sbjct: 212 SSDSSSSSD--SSNSSDSSDSSDSSNSSESSDSSDSSDSDSSDSSDSSNSNSSDSDSSNS 269 

Query: 135 TPVPPIjNTEAEPTATEPDSTIADSEEYKSSSKKRGGIVGTLIAIiILLLIVAIFGVNYFKN 194 
+ + ++ + + S +DS + SS + + + N + 

30 Sbjct: 270 SDSSDSSNSSDSSDSSDSSNSSDSSDSSDSSNSSDSSDSSDSS DSSDSSNSSDS 323 

Query: 195 NNSTNSQTATSQSSSSKATTTSSEEDKKASQNLDNFNKSYANFFVDDKKTQLKNSEFDKL 254 

N+S+NS ++ S SS ++ +S D S + D+ N S D +S+ 

Sbjct: 324 NDSSNSSDSSDSSDSSDSSNSSDSSDSSDSSDSDSSNSS DSSNSSDSSDSCNS 376 

35 

Query: 255 SELEKKVDALKGTKYYGKVKVKFDSLKRQIDAVKAVNDKFKSPAVVDGKKSEKLEVKDGA 314 

S+ D+G+ + + D++N S+ +S+D + 

Sbjct: 377 SDSSDSSDSSDGSDSDSSNRSDSSNSSDSSDSSDSSNSSDSSDSSDSNESSNSSDSSDSS 436 

40 Query: 315 NFDSLDSKTLNTGNASLDSLLHSIVSTGRNQVKQSEEQASSNKVSDTQITEQPNVTNGQS 374 

N DS++SDSSSN S SSN + ++ N ++ + 
Sbjct: 437 NSSDSDSSDSSNSSDSSDSSNSSDSSESSNSSDNSNSSDSSNSSDSSDSSDSSNSSDSSN 496 

Query: 375 SSSAATINNQAAGTASGNLERNRSRVPYNNAAIADTGN 412 
45 SS ++ ++ + +S + + + S +++ +D+ + 

Sbjct: 497 SSDSSNSSDSSDSNSSDSSDSSXSSDSSDSSDSSDSSD 534 
Identities = 64/341 (18%) , Positives = 140/341 (40%) , Gaps = 35/341 (10%) 

Query: 59 DEVASQKFETKSSDFANLDTASLDDFIKKQREELS-AMLAAEELSKKLDNSVSQEQDTEA 117 
50 D+ S K ++ SSD + D+++ D + S + +++ S D+S S + D+ 

Sbjct: 76 DKSDSGKGKSDSSDSDSSDSSNSSDSSDSSDSDSSDSNSSSDSDSSDSDSSDSSDSDSSD 135 

Query: 118 NAVSPKEESSQEQENSVTPVPPLNTEAEPTATEPDSTIADSEEYKSSSKKRGGIVGTLIA 177 
++ S S + +S +++++ + +E DS+ +DS+ S S 
55 Sbjct: 136 SSNSSDSSDSSDSSDSSDSSDSSDSKSDSSKSESDSSDSDSKSDSSDSN 184 

Query: 178 LILLLIVAIFGYNyFKNNNSTNSQTATSQSSSSKATTTSSEEDKKASQNLDNFNKSYANF 237 

+++S NS ++ S +SS+ + ++ S + +S + D+ N S ++ 
Sbjct: 185 SSDSSDNSDSSDSSNSSNSSDSSDSSDSSDSSSSSDSSNSSDSS- 228 
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Query: 238 FTODKKTQLKNSEFDKLSELEKKVDALKGTKYYGKVKVTCFDSLKRQID 297 
D +SE S+ D+ + DS D+ + N S 
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Sbjct: 229 ---DSSDSSNSSESSDSSD-SSDSDSSDSSDSSWSNSSDSDS-SNSSDSSDSSNSSDSSD 283 

Query: 298 AVVDGKKSEKLEVKDGANFDSLDSKTIiNTGNASLDSLLHSIVSTGRNQVKQSEEQASSNK 357 

+ S+ + D +N S DS + + S DS S + N S+ SS+ 

Sbjct: 284 SSDSSNSSDSSDSSDSSN- - SSDSSDSSDSSDSSDSSNSSDSNDSSNSSDSSDSSDSSDS 341 

Query: 358 VSDTQITEQPNVTNGQSSSSAATINNQAAGTASGNLERNRS 398 

+ + ++ + ++ SS+S+ + N+ + + + + + S 
Sbjct: 342 SNSSDSSDSSDSSDSDSSNSSDSSNSSDSSDSCNSSDSSDS 382 

A related DNA sequence was identified in S.pyogenes <SEQ ID 297 1> which encodes the amino acid 
sequence <SEQ ID 2972>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

15 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.70 Transmembrane 180 - 196 ( 175 - 202) 
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Final Results 

bacterial membrane Certainty=0 . 6880 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF15293 GB:AF202180 erythrocyte membrane-associated giant 
25 protein antigen 332 [Plasmodium falciparum] 

Identities = 41/173 (23%) , Positives = 87/173 (49%) , Gaps = 10/173 (5%) 

Query: 1 VSEESKEVEVTKESQTLGLNEAKSMTIGEAVRKQSE IKAGVTKDDSILDKYIKQHR 56 

+ E + V + KE + GL+ + + ++V +Q+E I + K+ S ++ ++ 
30 Sbjct: 78 IEEAEENVWIEKEVEEEGLDNEEVIDEEDSVSEQAEEEVYINEEILKESSDVEDVKVENE 137 

Query: 57 DEVS SQKFDAKYTELDTASLDNFI KKQREALSKAGLVDDEPVSAESAEQDSTLVEEV 113 

+EV+ + + LDN++ ++ E++++ +VD+ P S E E +S ++EE+ 

Sbjct: 138 LMNEEVNEETQSVAENNEEDKELDNYAA/EETESWEEVVVDEVPNSKEVQEIES-IIEEI 196 

35 

Query: 114 AEDLAPMETTAWTGIPVEATVPVLDLDPSERVIPEPQMTKEEPKRDQFLSED 166 

ED + G +E V + D SE ++ E +T+E K++ ++ED 

Sbjct: 197 VEDGLTTDDLVGQQGSVIEEWEEVGSD- SEGTVEEAS ITEE VEKKES - VTED 247 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 234/506 (46%) , Positives = 304/506 (59%) , Gaps = 36/506 (7%) 

Query: 1 MSEDQKHPFFEPKKETDGLEFKDAKEMTVEEAWKDSEIKAGITEEDSILDKYIKQHRDE 60 
+SE+ K E KE+ L +AK MT+ EAVRK SEI KAG+T+ +DS I LDKYI KQHRDE 
45 Sbjct: 1 VSEESKE- -VEVTKESQTLGLNEAKSMTIGEAVRKQSEIKAGVTKDDSILDKYI KQHRDE 58 

Query: 61 VASQKFETKSSDFANLDTASLDDFIKKQREELSAMLAAEELSKKLDNSVSQEQDTEANAV 120 

V+SQKF+ K + LDTASLD+FIKKQRE LS A + + ++ S EQD+ 
Sbjct: 59 VSSQKFDAK YTELDTASLDNFIKKQREALSK AGLVDDEPVSAESAEQDSTLVEE 112 

50 

Query: 121 SPKEESSQEQENSVTPVPPLNT EAEPTATEP- -DSTIADSEEYKSS 164 

++ + E VT +P T E + T EP D +++ + + 

Sbjct: 113 VAEDLAPMETTAWTGIPVEATVPvLDLDPSERVIPEPQMTKEEPKRDQFLSEDSHHPAK 172 

55 Query: 165 SKKRGGIVGTLIALILLLIVAIFGYNYFKNNNSTNSQTATSQSSSSKATTTSSEEDKKAS 224 

+ G + L L+L ++ +FG+N+F +S + S+ + + T S+++ + 
Sbjct: 173 QNTKKGWLI7ALFLLLIAILAWFGWNHFLRQDSGKTTQTASKQTKTSLQTDSAKKATRLK 232 

Query: 225 QNLDNFNKSYANFFVDDKKTQLKNSEFDKLSELEKKVDALKGTKYYGKVKVKFDSLKRQI 284 
60 F K Y F+ D K++LKNS F L +LE + AL+G+ YY K K K DSLK+ I 

Sbjct: 233 AAAKAFEKLYGTFYTDATKSKLKNSAFATLPDLEAALKALEGSAYYDKAKAKVDSLKKAI 292 



Query: 285 DAVKAVNDKFKSPAVVDGKKSEKLEvTCDGANETJSLDSKTLNTGNASLDSLLHSIVSTGRN 344 
A+ AVN KF S WDG+K EVK ANFD Ii S Hi GNA+LD++L + ++ GR 
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Sbjct: 293 AAITAWGKFVSDVVVDGEKVSA-EVKADANFDDLSSATLTIGNANLDAVLQASITEGRQ 351 



5 



Query: 345 QVKQSEEQASSNKVSDTQITEQPNVTNGQSSSSAATINNQAAGTAS GNLERNRSRVP 401 

Q+ E A K ++ Q Q GQS+S A + G S +L+R+ SRVP 

Sbjct:: 352 QLASKAEAA KAANEQAV- QDQAAQGQSTSVAPS GYGLTSYDPASLQRHLSRVP 403 



Query: 402 YimAAIADTGNPAWIFNPGVLEKIVATSQARGYFSGNNYILEPVNIINGNGYYNMFKLDG 461 

YN IAD NP+W FNPGVLEKIVATSQARGY SGN YILEPVNI INGNGYYNMFK DG 
Sbjct: 404 YNQDVIADRANPSWAFNPGVLEKIVATSQARGYISGNQYILEPVNIINGNGYYNMFKPDG 463 



10 



Query: 462 TYLFS INAKTGYFVGNAPGRADSLDY 487 

TYLFSIN KTGYFVGN G AD+LDY 
Sbjct: 464 TYLFS INCKTGYFVGNGKGYADALDY 489 



15 SEQ ID 2970 (GBS351) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 2; MW 57kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 5; MW 82kDa). 

GBS351-GST was purified as shown in Figure 216, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 972 

A DNA sequence (GBSxl031) was identified in S.agalactiae <SEQ ID 2973> which encodes the amino 
acid sequence <SEQ ID 2974>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
25 >>> Seems to have no N-terminal signal sequence 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2975> which encodes the amino acid 
sequence <SEQ ID 2976>. Analysis of this protein sequence reveals the following: 

Possible site: 19 



Final Results 



30 



bacterial cytoplasm Certainty=0. 3169 (Affirmative) < suoo 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



35 



>>> Seems to have no N-terminal signal sequence 



Final Results 



40 



bacterial cytoplasm Certainty=0 . 3169 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



Identities = 129/160 (80%) , Positives = 149/160 (92%) 



45 



Query: 1 MTKEVVVESFELDHTIVKAPyVRLISEEVGPVGDIITNFDIRLIQPNENAIDTAGLHTIE 60 

MTKEV+VESFELDHTIVKAPYVRLISEE GP GD ITNFD+RL+QPN+N+ I +TAGLHTIE 
Sbjct: 1 MTKEVIVESFELDHTIVKAPYVRLISEEFGPKGDRITNFDVRLVQPNQNSIETAGIiHTIE 60 



50 



Query: 61 HLLAKLIRQRINGLIDCSPFGCRTGFHMIMWGKQDATEIAKVIKBSLEAIAGGVTWEDVP 120 

HLLAKLIRQRI+G+IDCSPFGCRTGFH+IMWGK +T+IAKVIKSSLE IA G+TWEDVP 
Sbjct: 61 HLIAKLIRQRIDGMIDCSPFGCRTGFHLIMWGKHSSTDIAKVIKSSLEEIATGITWEDVP 120 



Query: 
55 Sbjct: 



121 GTTIESCGNYKDHSLHSAQEWAKLILSQGISDNAFERHIV 160 

GTT+ESCGNYKDHSL +A+EWA+LI* QGISD+ F RH++ 
121 GTTLESCGNYKDHSLFAAKEWAQLIIDQGISDDPFSRHVI 160 
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Final Results 
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bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the databases: 

>GP:AAF34762 GB:AF228345 unknown [Listeria monocytogenes] 
Identities = 299/534 (55%) , Positives = 408/534 (75%) , Gaps = 14/534 (2%) 

Query: 2 VNIILLIVSALIGLILGYALISIRLKSAKEARELTLLNftEQEAVDIRGKAEVDAEHIKKT 61 
10 + I + I+S+L+ LI+G + S+ KS+ E++ RG AE+ I + 

Sbjct: 1 MTIAITIISSLLFLIVGLWGSLIFKSS TEKKLAAARGTAEL IVED 46 

Query: 62 AKRESKANRKELLLEAKEEARKYREEIEQEFKSERQELKQLETRLAERSLTLDRKDENLS 121 
AK+E++ +KE LLEAKEE + R EIE E + R E ++ E RL +R LDRKD +LS 
15 Sbjct: 47 AKKEAETTKKEALLEAKEENHRLRTE I ENELRGRRTETQKAENRLLQREENLDRKDTSLS 106 

Query: 122 SKEKVLDSKEQSLTDKSKHIDERQLQVEKLEEEKKAELEKVAAMTIAEAREVILMETENK 181 

+E L+ KE+S++ + + I+E++ ++ ++ + ++ ELE+++A++ EA+ +IL + E + 
Sbjct: 107 KREATLERKEESISKRQQQIEEKESKLAEMIQAEQTELERISALSKEEAKSIILNQVEEE 166 

20 

Query: 182 LTHEIATRIRDAERDIKDRTVKTAKDLLAQAMQRLAGEYVTEQTITSVHLPDDNMKGRII 241 

LTH+ A ++++E K+ + K AK++L+ A4QR A ++V E T++ V LP+D MKGRII 
Sbjct: 167 LTHDTAIMVKESENRAKEESDKKAKNILSIAIQRC^AADHVAETTVSVVTLPNDEMKGRII 226 

25 Query: 242 GREGRNIRTLESLTGIDVIIDDTPEWILSGFDPIRREIARMTLESLIADGRIHPARIEE 301 

GREGRNIRTLE+LTGID+ 1 IDDTPE VILSGFDPIRREIAR+ LE L+ DGRIHPARIEE 
Sbjct: 227 GREGRNIRTLETLTGIDLIIDDTPEAVILSGFDPIRREIARIALEKLVQDGRIHPARIEE 286 

Query: 302 LVEKNRLEMDNRIREYGEAAAYEIGAPNLHPDLIKIMGRLQFRTSFGQNVLRHSVEVGKL 361 
30 +V+K R E+D IRE GE A +E+G ++HPDLIKI+GRL++RTS+GQNVL HS+EV KL 

Sbjct: 287 MVDKARKEVDEHIREVGEQATFEVGIHSIHPDLIKILGRIJIYRTSYGQNVLNHSLEVSKL 346 

Query: 362 AGIIAGELGENVMiARRAGFLHDMGKAIDREVEGSHVEIGMEFARKYKEHPTWVNTIASH 421 
AGILAGELGE+V IA+RAG LHD+GKAID E+EGSHVEIG+E A KYKE+ W+N+IASH 
35 ' Sbjct: 347 AGILAGELGEDVTLAKRAGLLHDIGKAIDHEIEGSHVEIGVELATKYKENDVVINSIASH 406 

Query: 422 HGDVEPDSVIAVLVAAADALSSARPGARNESMENYIKRLRDIiEEIATSFDGVQNSFALQA 481 

HGD E SVI AVL VAAADALS +ARPGAR+E++ENYI +RL LEEI+ S+DGV+ S+A+QA 
Sbjct: 407 HGDTEATSVIAVLVAAADALSAARPGARSETL.ENYIRRLEKLEEISESYDGVEKSYAIQA 466 

40 

Query: 482 GREIRIMVQPEKISDDQWILSHKVREKIENNLDYPGNIKVTVIREMRAVDYAK 535 

GRE+RI+V+P+ ID L+ +R++IE LDYPG+IKVTVIRE RAV+YAK 

Sbjct: 467 GREVRII VEPDTIDDLSSYRLARDIRKRIEEELDYPGHIKVTVIRETRAVEYAK 520 

45 An alignment of the GAS and GBS proteins is shown below. 

' Identities = 451/535 (84%) , Positives = 503/535 (93%) 

Query: 1 MFNIILAMVCALIGLIIGYVAISMKMKSSKEAAELTLLNAEQDAVDLRGKAEIEAEHIRK 60 
M NIIL +V ALIGLI+GY IS+++KS+KEAAELTLLNAEQ+AVD+RGKAE++AEHI+K 
50 Sbjct: 1 MVNIILLIVSALIGLILGYALISIRLKSAKEAAELTLIiNAEQEAVDIRGKAEvDAEHIKK 60 

Query: 61 AAERESKAHQKELLLEAKEEARKYREEIEKEFKSDRQELKQMEARLTDRASSLDRKDENL 120 

A+RESKA++KELLLEAKEEARKYREEIE+EFKS+RQELKQ+E RL +R+ +LDRKDENL 
Sbjct: 61 TAKRESKANRKELLLEAKEEARKYREEIEQEFKSERQELKQLETRLAERSLTLDRKDENL 120 

55 ' 

Query: 121 SNKEKMLDSKEQSLTDKSRHINEREQEIATLETKKVEELSRIAELSQEEAKDIILADTEK 180 

S+KEK+LDSKEQSLTDKS+HI +ER+ ++ LE +K EL ++A ++ EA+++IL +TE 
Sbjct: 121 SSKEK^DSKEQSLTDKSKHIDERQLQVEKLEEEKKAELEKVARMTIAEAREVILMETEN 180 

60 Query: 181 DIAHDIATRIKEAEREVTORSNKIAKDLLAQAMQRLAGEYVTEQTITTVHLPDDNMKGRI 240 

L H+IATRI++AER++KDR+ K AKDLLAQAMQRLAGEYVTEQTIT+VHLPDDNMKGRI 
Sbjct: 181 KLTHEIATRIRDAERDIKDRTVKTAKDLLAQAMQRLAGEYVTEQTITSVHLPDDNMKGRI 240 

Query: 241 IGREGRNIRTLESLTGIDVIIDDTPEVWLSGFDPIRREIARMTLESLIQDGRIHPARIE 300 
65 IGREGRNIRTLESLTGIDVIIDDTPEW+LSGFDPIRREIARMTLESLI DGRIHPARIE 

Sbjct: 241 IGREGRNIRTLESLTGIDVIIDDTPEWILSGFDPIRREIARMTLESLIADGRIHPARIE 300 
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Query: 301 ELVEKNRLEMDQRIREYGEAAAYEIGAPNLHPDLIKIMGRLQFRTSYGQNVLRHSVEVGK 360 

ELVEKNRLEMD RIREYGEflAAYEIGAPNLHPDLIKIMGRLQFRTS+GQNVLRHSVEVGK 
Sbjct: 301 ELVEKNRLEMDNRIREYGEAAAYEIGAPNLHPDLIKIMGRLQFRTSFGQNVLRHSVEVGK 360 

5 

Query: 361 IAGII^GELGENVDIARRA.GFLHDMGKAIDREVEGSHVEIGMEFARKYKEHPIVVNTIAS 420 

LAGILAGELGENV LARRAGFLHDMGKAIDREVEGSHVEIGMEFARKYKEHP+WNTIAS 
Sbjct: 361 LAGILAGELGENVALARRAGFLHDMGKAIDREVEGSHVEIGMEFARKYKEHPVVVNTIAS 420 

10 Query: 421 HHGDVEPDSVIAVIVAAADALSSARPGARNESMENYIKRLRDLEEIANGFEGVQNAFALQ 480 

HHGDVEPDSVIAV+VAAADALSSARPGARNESMENYIKRLRDLEEIA F+GVQN+FALQ 
Sbjct: 421 HHGDVEPDSVIAVLVAAADALSSARPGARNESMENYIKRLRDLEEIATSFDGVQNSFALQ 480 

Query: 481 AGREIRIMVQPGKVSDDQWIMSHKVREKIEQNLDYPGNIKVTVIREMRAVDFAK 535 
15 AGREIRIMVQP K+SDDQWI+SHKVREKIE NLDYPGNIKVTVIREMRAVD+AK 

Sbjct: 481 AGREIRIIWQPEKISDDQWILSHKVREKIENNLDYPGNIKVTVIREMRAVDYAK 535 

SEQ ID 2978 (GBS86) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 7 (lane 6; MW 59kDa). It was also expressed in E.coli as a GST-fusion product. 
20 SDS-PAGE analysis of total cell extract is shown in Figure 1 3 (lane 5; MW 84kDa). 

GBS86-GST was purified as shown in Figure 192, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 974 

25 A DNA sequence (GBSxl033) was identified in S.agalactiae <SEQ ID 2981> which encodes the amino 
acid sequence <SEQ ID 2982>. Analysis of this protein sequence reveals the following: 
Possible site: 28 

»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm Certainty=0 .4984 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 975 

40 A DNA sequence (GBSxl034) was identified in S.agalactiae <SEQ ID 2983> which encodes the amino 
acid sequence <SEQ ID 2984>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
' >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.87 Transmembrane 146 - 162 ( 146 - 162) 



45 



50 



Final Results 

bacterial membrane Certainty=0. 2147 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 8697> which encodes amino acid sequence <SEQ ID 8698> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrira Score: -10.72 
5 GvH: Signal Score (-7.5): -5.66 

Possible site: 29 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 1 value: -2.87 threshold: 0.0 

INTEGRAL Likelihood = -2.87 Transmembrane 138 - 154 ( 138 - 154) 
10 PERIPHERAL Likelihood =3.76 51 

modified ALOM score: 1.07 

*** Reasoning Step: 3 

15 Final Results 

bacterial membrane Certainty=0 .2147 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG21390 GB:AF302051 ABC transporter ATP binding subunit 
[Bacillus licheniformis] 
Identities = 84/218 (38%) , Positives = 138/218 (62%) , Gaps = 1/218 (0%) 

25 Query: 12 DIIKVDHIFKSIGQKTILEDISFSIASNQCVALIGPNGAGKTTLMSTLLGDISISSGSLT 71 

+++ + ++ K+ QKT ++ I FSI + VA++GPNGAGKTT +S +LG + ++G++T 
Sbjct: 3 NWSLTNVTKTFRQKTAVDQIDFSIKKGEIVAILGPNGAGKTTTISMILGLLKPTAGNIT 62 

Query: 72 IFNLPAHHNRLKYKVAILPQE-NVLPSKFTVRELIDFQRCLFPEVLPMSLILDYLQWSDT 130 
30 +F+ H R++ K+ + QE +V+P E+I+ R +P+ L + +D 

Sbjct: 63 LFDSMPHEKRWEKIGTMLQEVSVMPGLRCRVEIIELIRSYYPKPLSFQKLRTLTGLTDK 122 

Query: 131 HLQQFTETLSGGQKRLLAFVLTLVGKPQLLFLDEPTSGMDTSTRQRFWELIATLKKEGVT 190 
L+ E LSGGQKR L F L L G P+L+ DEPT GMD ++R RFW+ + +L ++G T 
35 Sbjct: 123 DLKTQAEKLSGGQKKRLGFAIJUjAGDPELMIFDEPTVGMDITSRNRFWQTVQSLAEQGKT 182 

Query: 191 IVYSSHYIEEVEHTADRILVLHKGKLLRDTTPLCHEAR 228 

I++S+HY++E + A RIL+ GK++ D TPL ++R 
Sbjct: 183 1 1 FSTHYLQEADDAAQRI LLFKDGKI VADGTPLQI KSR 220 

40 

There is also homology to SEQ ID 686. 

SEQ ID 8698 (GBS350) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 72 (lane 13; MW 28.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 4; MW 54kDa). 

45 GBS350-GST was purified as shown in Figure 226, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 976 

A DNA sequence (GBSxl035) was identified in S.agolactiae <SEQ ID 2985> which encodes the amino 
50 acid sequence <SEQ ID 2986>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0 .2913 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 977 

A DNA sequence (GBSxl036) was identified in S.agalactiae <SEQ ID 2987> which encodes the amino 
1 0 acid sequence <SEQ ID 2988>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 


= -10. 


,51 


Transmembrane 


222 


- 238 


( 214 


- 241) 


INTEGRAL 


Likelihood 


= -6. 


,90 


Transmembrane 


104 


- 120 


( 101 


- 125) 


INTEGRAL 


Likelihood 


= -5. 


.84 


Transmembrane 


140 


- 156 


( 138 


- 159) 


INTEGRAL 


Likelihood 


= -5. 


.20 


Transmembrane 


19 


- 35 


( 18 


- 41) 


INTEGRAL 


Likelihood 


= -1. 


.28 


Transmembrane 


164 


- 180 


( 164 


- 180) 



Final Results 

20 bacterial membrane Certainty=0 . 5203 (Affirmative) < suco 

bacterial outside ' Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:CAB69806 GB:AJ243712 YVFS protein [Bacillus cereus] 

Identities = 73/239 (30%) , Positives = 127/239 (52%) , Gaps = 4/239 (1%) 

Query: 9 KMEFLLTKRQLANLimiGMPVAFFLFFSGFMGEGLTKAIEAIYVRNYMITMAGFSSLSF 68 
K+E L T R + ++ MPV F+ F+ + + +Y+I+MA FS + 

30 Sbjct: 4 KI E I LRTFRNKLFI FFSLLMPVMFYY I FTNWQ VPQNGDAWKAHYLI SMATFS I VGT 60 

Query: 69 AFFTFPFSMKDDQLSNRMQLLRHSPVPMWQYYLAKIIRILFYYCLAITWFLTGHILRQV 128 

A F+F + ++ LL+ +P+P Y AKII +1 V+F+ G ++ V 

Sbjct: 61 ALFSFGVRLSQERGQGWTHLLKITPLPEGAYLTAKIIAQTWNAFSILVIFIAGILINHV 120 

35 

Query: 129 SMP IEQWMQS FLLLLGGATCFI PFGLLVS YFKNTELMSMVANI CYMSLAVLGGMWMPITM 188 

+ I QW+ + L LL G T F+ G ++ K + + +ANI MSLA++GG+WMPI + 
Sbjct: 121 ELTIGQWIGAGLWLLLGVTPFIMjGTVIGSIKKADAAAGLANILNMSLAIVGGLWMPIEV 180 

40 Query: 189 FPKWLQALSKLTPTYHLTQVILSPFANSFAGF-SLIILIGYGIIMLVIAYLLSQKRHSI 246 

FPK L+ + + TPTYH A G+ ++ +L GY +1 +V++ + +++ ++ 

Sbjct: 181 FPKILRTIGEWTPTYHFGSGAWDIVAGKSIGWENIAVLGGYFLIFVWSIYIRKRQEAV 239 

There is also homology to SEQ ID 682 and to SEQ ID 1628. 

45 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 978 

A DNA sequence (GBSxl037) was identified in S.agalactiae <SEQ ID 2989> which encodes the amino 
acid sequence <SEQ ID 2990>. This protein is predicted to be histidine kinase. Analysis of this protein 
50 sequence reveals the following: 

Possible site: 49 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.43 Transmembrane 105 - 121 ( 102 - 124) 
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INTEGRAL Likelihood = -6.95 Transmembrane 130 - 146 ( 129 - 149) 



Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9537> which encodes amino acid sequence <SEQ ID 9538> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54584 GB:AJ006400 histidine kinase [Streptococcus pneumoniae] 
Identities = 138/350 (39%) , Positives = 212/350 (60%) , Gaps = 3/350 (0%) 

Query: 11 MYFIPLVFLIYPIGGILYYHYPFWTLFFTLAFVGAYLYSVIIRGESKYHMIAWSTMLTYI 70 

M++1 L+F+I+PI ++ W L + FV AYL V+ + + W MLTY+ 

Sbjct: 11 MFWISLIFMIFPILSWTGWLSAWHLLIDILFWAYL-GVLTTKSQRLSWLYWGLMLTYV 69 

Query: 71 FYMTIFINSGFIWYIYFLSNLLA7YRFRDK-LKSFRFISFACTLATWF-LCFFKASDFGD 128 

T F+ +IW+ +FLSNLL Y F + LKS +F W L F+ + 

Sbjct: 70 VGNTAFVAvNYIWFFFFLSNLLSYHFSVRSLKSLHvWTFLLAQvLVVGQLLIFQRIEVEF 129 

Query: 129 RIMFLIVPIFCIGYMWIAIENRNSEEQREKIAEQNQYINILSAENERNRIGRDLHDSLGH 188 

L++ F + + R E+ +E +QN IN+L AENER+RIG+DLHDSLGH 

Sbjct: 13 0 LFYLLVILTFVDLMTFGLVRIRIVEDLKEAQVKQNAQINLLIAENERSRIGQDLHDSLGH 189 

Query: 189 TFAMTLKTEIiALKLLEKRNyDK^QKELSELNHISHQSMSEWQIVSNLKYRTVVEEIDE 248 

TFAM+++KT+LAL+L + Y +V+KEL E++ IS SM+EVR IV NLK RT+ E++ 
Sbjct: 190 TFAMLSVKTDIALQLFQMEAYPQVEKELKEIHQISKDSMNEVRTIVENLKSRTLTSELET 249 

Query: 249 LYRLFQLSNIKLTvVNKLETSQLSPVTQSTITMILKELSNNIVKHAFADSVELSLVRQGA 308 

+ ++ +++ I++ V N L+ S L+ +ST +MIL EL NI+KHA+A V L L R 
Sbjct: 250 VKKMLEIAGIEVQVENHLDKSSLTQELESTASMILLELVTNIIKHAKASKVYLKLERTEK 309 

Query: 309 TINIEMIDNGCGFTNLDGDELHSIQERLTIVEGTLTILSRSKPTHIQWL 358 

+ + + D+GCGF ++ GDELH+++ R+ G ++++S+ PT +QV L 
Sbjct: 310 ELILTWDDGCGFASISGDELHTVRNRVFPFSGEVSVISQKHPTEVQVRL 359 

There is also homology to SEQ ID 2992. 

A related GBS gene <SEQ ID 8699> and protein <SEQ ID 8700> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 

McG: Discrim Score: 10.90 

GvH: Signal Score (-7.5): -2.42 
Possible site: 49 

>» Seems to have a cleavable N-term signal seq. 

ALOM program count: 2 value: -7.43 threshold: 0.0 

INTEGRAL Likelihood = -7.43 Transmembrane 105 - 121 ( 102 - 124) 
INTEGRAL Likelihood = -6.95 Transmembrane 130 - 146 ( 129 - 149) 
PERIPHERAL Likelihood =0.16 61 
modified ALOM score: 1.99 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 3972 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 979 

A DNA sequence (GBSxl038) was identified in S.agalactiae <SEQ ID 2993> which encodes the amino 
acid sequence <SEQ ID 2994>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

5 Possible site: 28 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 49 - 65 ( 49 - 65) 

Final Results 

10 bacterial membrane Certainty=0. 1065 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

15 >GP:CAB54585 GB:AJ006400 response regulator [Streptococcus pneumoniae] 

Identities = 95/153 (62%) , Positives = 125/153 (81%) , Gaps = 3/153 (1%) 

Query: 1 MKLLVAEDQSMLRDAMCQLLLMEESVSTIDQAGNGGEAIAILSNKAIDVAILDVEMPILS 60 
MK+LVAEDQSMLRDAMCQLL+++ V ++ OA NG EAI +L +++D+AILDVEMP+ + 
20 Sbjct: 1 MKVLVAEDQSMLRDAMCQLLMLQPDVESVFQAKNGQEAIQLLEKESVDIAILDVEMPVKT 60 

Query: 61 GLDVLEWVRKYQ-NVKVI IOTTFKRSGYFQRAIRSNVDAYVLKDRSVADLMKTIQKVLSG 119 

GL+VLEW+R + KV++VTTFKR GYF+RA+++ VDAYVLK+R+ +ADLM+T+ VL G 
Sbjct: 61 GLEVLEWIRAEKLETKVVVVTTFKRPGYFERAVKAGVDAYVLKERNIADLMQTLHTVLEG 120 

25 

Query: 120 GKEYSPELMENVI - -SNPLSEQEIKILSLIAQG 150 

KEYSPELME V+ NPL+EQEI +L IAQG 
Sbjct: 121 RKEYSPELMEWMMHPNPLTEQE1AVLKBIAQG 153 

30 There is also homology to SEQ ID 2996. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 980 

A DNA sequence (GBSxl039) was identified in S.agalactiae <SEQ ID 2997> which encodes the amino 
35 acid sequence <SEQ ID 2998>. Analysis of this protein sequence reveals the following: 



40 



Final Results 

45 bacterial membrane Certainty=0. 3675 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAB85965 GB:AE000909 unknown [Methanothermobacter 

thermoautotrophicus] 
Identities = 46/183 (25%) , Positives = 81/183 (44%) , Gaps = 11/183 (6%) 

Query: 5 KERFDTLSDAILAIAMTILVLEI KTPATMGDIGDFTRNIGLFIVSFWVFNFW 57 

55 K+R + L DAI AIAMTILVL I PA I ++ + +SF+++ FW 

Sbjct: 6 KKRLEGLVDAIFAIAMTILVLGIDVPTGTMSVPAMDAYIMGLASDLYSYCLSFLLLGVFW 65 



Possible site 


: 34 
















>>> Seems to 


have an uncleavable N-term signal seq 
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Query: 58 YERAQNSLDAQKTNDEIIALDIIEHLGICLIPLFTKFMISFENHNFAVMAYGLLTLLVGL 117 

+ + +K + I ++I+ + + L+P TK ++ + + + L L +GL 

Sbjct: 66 WViraMHFEKLEKVDTGFIWINIWLMVVVLVPFSTKLTGNYGDLVTPNILFHLNMLTIGL 125 

5 

Query: 118 TSD 1 1 RIRLASYDL VTI PSELKERVI KVMTTFAIRSVWRFI III LAYFLPE VGI FAYLV 177 

+ 1 L+ I ++K + ++ + +IL PE AY V 
Sbjct: 126 LLSMSWIYTQRNGLMDIGENEYRLILKKNLLMPIAAI LALILTPIAPEYSSTAYAV 181 

10 Query: 178 I PL 180 

+ L 

Sbjct: 182 LIL 184 

No corresponding DNA sequence was identified in S.pyogenes. 

15 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 981 

A DNA sequence (GBSxl040) was identified in S.agalactiae <SEQ ID 2999> which encodes the amino 
acid sequence <SEQ ID 3000>. This protein is predicted to be guanylate kinase (gmk). Analysis of this 
20 protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

25 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

30 >GP:CAB13441 GB:Z99112 similar to guanylate kinase [Bacillus subtilis] 

Identities = 121/202 (59%) , Positives = 155/202 (75%) 

MSERGLLIVFSGPSGVGKGTVRQEIFSTPDHKFDYSVSMTTRPQRPGEVDGVDYFFRTRE 6 0 
M ERGLLIV SGPSGVGKGTVRQ IFS D KF+YS+S+TTR R GEV+GVDYFF+TR+ 



EFE +1 + ++LE+AEYVGNYYGTP+ YV +TL G DVFLEIEVQGALQV++ P+G+F 



IFL PP L EL+ R+V RGT++ +1 R++ AK EI +M YDY V ND V A +++K 



++ AEH + +RV RY M++ 
\IVLAEHLKRERVAPRYKKMLE 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3001> which encodes the amino acid 
50 sequence <SEQ ID 3002>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
>» Seems to have an uncleavable N-term signal seq 

Final Results 

55 bacterial membrane Certainty=0 . 0000 (Not Clear) <c suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:CAB13441 GB:Z99112 similar to guanylate kinase [Bacillus subtilis] 
Identities = 123/203 (60%) , Positives = 157/203 (76%) 

5 Query: 1 MSERGLLIVFSGPSGVGKGTVRQEIFSTPDHKFEYSVSMTTRPQRPGEVDGVDYFFRTRE 60 

M ERGLLIV SGPSGVGKGTVRQ IFS D KFEYS+S+TTR R GEV+GVDYFF+TR+ 
Sbjct: 41 MKERGLLI VLSGPSGVGKGTVRQAIFSQEDTKFEYSISVTTRSPREGEVNGVDYFFKTRD 100 

Query: 61 EFEELIKTGQMLEYAEYVGNYYGTPLTYVNETLDKG1DVFLEIEVQGALQVKSKVPDGVF 120 
10 EFE++I ++LE+AEYVGNYYGTP+ YV +TL G DVFLEIEVQGALQV++ P+G+F 

Sbjct: 101 EFEQMIADNKLLEWAEYVGNYYGTPVDYVEQTLQDGKDVFLEIEVQGALQvENAFPEGLF 160 

Query: 121 VFLTPPDLDELEDRLVGRGTDSQEVIAQRIERAKEEIALMREYDYAVViroEVALAAERVK 180 
+FL PP L EL++R+V RGT++ +1 R++ AK EI +M YDY V ND V A +++K 
15 Sbjct: 161 IFIAPPSLSELKITOIvTRGTETDALIENRMKAAKAEIEMMDAYDYVVENDNVETACDKIK 220 

Query: 181 RIIETEHFRVERVIGRYDKMIKI 203 

1+ EH + ERV RY KM+++ 
Sbjct: 221 AIVLAEHLKRERVAPRYKKMLEV 243 

20 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 186/204 (91%), Positives = 197/204 (96%) 

Query: 1 MSERGLLIVFSGPSGVGKGTVRQEIFSTPDHKFDYSVSMTTRPQRPGEVDGVDYFFRTRE 60 
25 MSERGLLI VFSGPSGVGKGTVRQEI FSTPDHKF+YSVSMTTRPQRPGE VDG VDYFFRTRE 

Sbjct: 1 MSERGLL IVFSGPSG VGKGTVRQE I FSTPDHKFEYSVSMTTRPQRPGE VDGVDYFFRTRE 60 

Query: 61 EFEALIKEGQMLEYAEYVGNYYGTPLSYVNETLDRGIDVFLEIEVQGALQVKSKVPDGVF 120 
EFE LIK GQMLEYAEYVGNYYGTPL+YVlffiTIjDKBinvFLEIEVC<3ALQVKSKVPDGVF 
30 Sbjct: 61 EFEELIKTGQMLEYAEYVGNYYGTPLTYVlffiTLDKGIDVFLEIEVQGALQVKSKVPDGVF 120 

Query: 121 IFLTPPDLEELEERLVGRGTDSPEVIAQRIERAKEEIALMREYDYAVVNDQVSLAAERVK 180 

+FLTPPDL+ELE+RLVGRGTDS EVIAQRIERAKEEIALMREYDYAVVm)+V+LAAERVK 
Sbjct: 121 VFLTPPDIiDELEDRLVGRGTDSQEVIAQRIERAKEEIALMREYDYAWNDEVALAAERVK 180 



35 



Query: 181 RVIEAEHYRVDRVIGRYTNMVKET 204 

R+IE EH+RV+RVIGRY M+K T 
Sbjct: 181 RIIETEHFRVERVIGRYDKMIKIT 204 



40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 982 

A DNA sequence (GBSxl041) was identified in S.agalactiae <SEQ ID 3003> which encodes the amino 
acid sequence <SEQ ID 3004>. Analysis of this protein sequence reveals the following: 

45 Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1763 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3005> which encodes the amino acid 
55 sequence <SEQ ID 3006>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty^O. 1551 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certalnty=0. 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 95/105 (90%) , Positives = 100/105 (94%) , Gaps = 1/105 (0%) 

Query: 1 MMLKPSIDTLLDKVPSKYSLVILQAKRAHELEAGEKATQDFKSVKSTLRALEEIESGNW 60 
10 MMLKPSIDTLLDKVPSKYSLVILQAKRAHELEAG TQ+FKSVKSTL+ALEEIESGNW 

Sbjct: 1 MMLKPSIDTLLDKVPSKYSLVILQAKRAHELEAGATPTQEFKSVKSTLQALEEIESGNW 60 

Query: 61 IHPDPSAKRASVRARIEAERIAKEEEERKIKEQIAKEK-EDGEKI 104 
IHPDPSAKR +VRA+IEAERLAKEEEERKIKEQIAKEK E+GEKI 
15 Sbjct: 61 IHPDPSAKREAVRAKIEAERLAKEEEERKIKEQIAKEKEEEGEKI 105 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 983 

20 A DNA sequence (GBSxl043) was identified in S.agalactiae <SEQ ID 3007> which encodes the amino 
acid sequence <SEQ ID 3008>. Analysis of this protein sequence reveals the following: 
Possible site: 24 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 3413 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13444 GB:Z99112 primosomal replication factor Y (primosomal 
protein N') [Bacillus subtilis] 
Identities = 377/807 (46%) , Positives = 529/807 (64%) , Gaps = 21/807 (2%) 

35 Query: 6 AQVIVDIPLMQTDKPFSYAIPKDLEDLVQVGVRVHVPFGRGNRLLQGFWGFRDDDELET 65 

A+VIVD+ D+PF Y IP L+ +++ G+RV VPFG R +QGFV ++ +L 

Sbjct: 4 AEVIVDVSTKNIDRPFDYKIPDHLKGMIKTGMRVIVPFGP--RKIQGFVTAVKEASDLSG 61 

Query: 66 KDIAEV LDFEPVLNQEQLDLADQMRHTVFSYKISILKSMLPSLLNSQYDKLLL A 119 

40 K + EV LD PVL +E + L+ + S+KI+ L++MLP+ L ++Y+K L 

Sbjct: 62 KSVKEVEDLLDLTPVLTEELMILSSWLSDKTLSFKITALQAMLPAALKAKYEKELKIAHG 121 

Query: 120 TDTLPSEDREDLFGHKTEIVFSSLSSQDAKKA-GRLIQKGFIEVQYLAKDKKTIKTEKIY 178 
D P +R LF +++S + + K R +QKG I+V Y K K + 

45 Sbjct: 122 ADLPPQVER--LFSETKTLLYSDIPDHETLKLIQRHVQKGDIDVTYKVAQKTNKKMVRHI 179 

Query: 179 KINRTLLEKSQ IAARAKKRLELKEFLLENPQPGRLTALN KQFS S PWNFFRE 230 

+ N + E ++ ++ +A K+ + FL+ P+ ++ A SS + + 

Sbjct: 180 QANASKEELAKQAEGLSRQARKQQAILHFIiISEPEGvKIPAAELCKKTDTSSATIKTLIQ 239 



50 



Query: 231 EGIIEVIEKEASRSDNYFKGILKTDFLDLNQECAKVVKIVVDQIGKEQNKPFLLEGITGS 290 

+G+++ +E R K KT+ L L EQ + + + + +++K FLL G+TGS 

Sbjct: 240 KGLLKESYEEVYRDPYQDKMFKKTEPLPLTDEQRAAFEPIRETLDSDEHKVFLLHGVTGS 299 



55 Query: 291 GKTEWLHIIDNVLKLGKTAIVLVPEISLTPQMTNRFISRFGKQVAIMHSGLSEGEKFDE 350 

GKTE+YL 1+ VL GK AIVLVPEISLTPQM NRF RFG QVA+MHSGLS GEK+DE 
Sbjct: 300 GKTEIYLQSIEKVLAKGKEAIVLVPEISLTPQMVNRFKGRFGSQVA^HSGLSTGEKYDE 359 

Query: 351 WRKI KSGQAKVWGARSAI FAPLENIGAI I IDEEHESTYKQESNPRYHARDVALLRAEYY 410 
60 WRKI + + + WGARSAI FAP EN+G 1 1 IDEEHES+YKQE PRYHA++VA+ RAE++ 

Sbjct: 360 WRKI HRKE VRLWGARSAI FAPFENLGMI I IDEEHESSYKQEEMPRYHAKEVAI KRAEHH 419 
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Query: 411 KAVLLMGSATPSIESRARASRDVYKFLELKHRANPKARIPQVEIIDFRNFIGQQEVSNFT 470 

+++GSATP++ES ARA + VY+ L LKHR N + +P+V ++D R + S F+ 

Sbjct: 420 SCPWLGSATPTLESYARAQKGVYELLSLKHRVNHRV-MPEVSLVDMREELRNGNRSMFS 478 

5 

Query: 471 SYLLDKIRDRLDKKEQVVLMLNRRGYSSFIMCRDCGYVDQCPNCDISLTLHMATKTMNCH 530 

L++K+ + + K EQ VL LN+RGYSSF+MCRDCGYV QCP+CDIS+T H + + CH 
Sbjct: 479 VELMEKLEETIAKGEQAVLFLNKRGYSSFVMCRDCGYVPQCPHCDISMTYHRYGQRLKCH 538 

10 Query: 531 YCGFEKP I PRTCPNCNSKS I SYYGTGTQKAYEELLKVI PDAKI LRMDVDTTRQKGGHES I 590 

YCG E+P+P TCP C S+ I ++GTGTQ+ EEL KV+P A+++RMDVDTT +KG HE + 
Sbjct: 539 YCGHEEPVPHTCPECASEHIRFFGTGTQRVEEELTKVLPSARVIRMDVDTTSRKGAHEKL 598 

Query: 591 LKRFGNHEADILLGTQMIAKGLDFPNVTLVGVLNADTSIaNLPDFRSSERTFQLLTQVAGR 650 
15 L FG +ADILLGTQMIAKGLDFPNVTLVGVL+ADT+L++PDFRS+E+TFQLLTQV+GR 

Sbjct: 599 LSAFGEGKADILLGTQMIAKGLDFPNVTLVGVLSADTTLHIPDFRSAEKTFQLLTQVSGR 658 

Query: 651 AGRAEKEGEVVIQTYNPNHYAIQIAQKQDFEAFYQYEMNIRRQLGYPPYYFTVGLTLSHK 710 
AGR EK G V+IQTY P+HY+IQL + D+E FYQ+EM RR+ YPPYY+ +T+SH+ 
20 Sbjct: 659 AGRHEKPGHVIIQTYTPSHYSIQLTKTHDYETFYQHEMAHRREQSYPPYYYIiALVTVSHE 718 

Query: 711 DEEWL IRKS YEVLSLLKQGFSDKVKLLGPTPKP I ARTHNLYHYQI 1 1 KYRFEDNLEL VLN 770 

+ + ++ LK K+LGP+ PIAR + Y YQ +IKY+ E L +L 

Sbjct: 719 EVAKAAVTAEKIAHFLKANCGADTKILGPSASPIARIKDRYRYQCVIKYKQETQLSALLK 778 



25 



Query: 771 RLLD-MTQDKENRDLRLAIDHEPQNMM 796 

++L+ ++ E + + ++ID P MM 
Sbjct: 779 KILEHYKREIEQKHVMISIDMNPYMMM 805 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 3009> which encodes the amino acid 
sequence <SEQ ID 301 0>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>>> Seems to have no N- terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 1396 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 556/793 (70%) , Positives = 659/793 (82%) , Gaps = 1/793 (0%) 

Query: 4 KLAQVIVDIPLMQTDKPFSYAIPKDLEDLVQVGVRVHVPFGRGNRLLQGFWGFRDDDEL 63 
K+A VIVDIPLMQTDKPFSY IPK+L LVQ+G RVHVPFG+GNRLLQGF++GF +D 
45 Sbjct: 12 KVAHVIVDI PLMQTDKPFSYGI PKELVSLVQLGSRVHVPFGKGNRLLQGFI IGFGQEDSS 71 

Query: 64 ETKDIAEVLDFEPVLNQEQLDLADQMRHTVFSYKISILKSMLPSLLNSQYDKLLLATDTL 123 

K I VLD EPVLNQEQL LADQ+R TVFSYKI++LK+M+P+LLNS YDK+L L 
Sbjct: 72 SLKLIQTVLDPEPVLNQEQLTLADQLRKWFSYKITLLKAMIPNLLNSNYDKVLRPESGL 131 

50 

Query: 124 PSEDREDLFGHKTEIVFSSLSSQDAKKAGRLIQKGFIEVQYLAKDKKTIKTEKIYKINRT 183 

DR+ LF K +++S+L + KA+IQGIV YLAKDKK +KTEK Y ++ 
Sbjct: 132 KKSDRDFLFEGKPSVLYSTLDREKEKIALKGIQAGHITVSYLAKDKKNLKTEKYYHVDLD 191 

55 Query: 184 LLEKSQIAARAKKRLELKEFLLENPQPGRLTALNKQFSSPWNFFREEGIIEVIEKEASR 243 

L I++RAKKR LK++LL + + +L L + FS W +F +1 + E+ R 
Sbjct: 192 AIAvnPISSRAKKKQLLKDYLLTHTKEAKLATLYQAFSRDWAYFVTNHLIRIDERPIDR 251 

Query: 244 SDNYFKGILKTDFLDLNQEQAKVVKIVVDQIGKEQNKPFLLEGITGSGKTEVYLHIIDNV 303 
60 S++YF I + FL LN++QA V +V+QIGK +KPFL+EGITGSGKTE VYLHI I + V 

Sbjct: 252 SESYFDQIKPSSFLTLNEQQASAVTEIVEQIGKP-SKPFLIEGITGSGKTEVYLHIIEAV 310 

Query: 304 LKLGKTAIVLVPEISLTPQMTNRFISRFGKQVAIMHSGLSEGEKFDEWRKIKSGQAKVW 363 
LK KTAIVLVPEISLTPQMT+RFISRFGKQVAIMHSGLS+GEKFDEWRKIK+GQAKWV 
65 Sbjct: 311 LKQDKTAI VLVPE I SLTPQMTSRFI SRFGKQVAIMHSGLSDGEKFDEWRKI KTGQAKWV 370 
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Query: 


jot 




423 






GARSAI F+PLE IGAIIIDEEHESTYKQESNPRYHAR+VALLRA++++AV++MGSATPSI 




Sb j ct : 


371 


GARSAIFSPLERIGAIIIDEEHESTYKQESNPRYHAREVALLRAKHHQAVWMGSATPSI 


430 


Quei?y i 


424 




483 






ESRARAS+ VY F++L RAltfP A+IP+V I+DFR++IGQQ VSNFT YL+DKI++RL K 




Sbj ct : 


431 


ESRARASKGVYHFIQLTQRANPLAKIPEVTIVDFRDYIGQQAVSNFTPYLIDKIKERLVK 


490 


Qusi"y : 


i ±o i ± 


TTI7nTn7T,MTATPPriVQQPTMr , PnfY2V^/T^ 


543 






KEQWLMIjNRRGYSSF+MCRDCGYVD+CPNCDISLTLHM TKTMNCHYCGF+KPIP TCP 




Sbjct: 


491 


KEQVVLMIJSfRRGYSSFVMCRDCGYV^ 


550 


Queiry : 




Ttfr^TQTfQTQVVnTrHYWAVFFT.T.Tn^ FftNTTO AT) T T J < 
iM^JN Ol\.OXO X Iblbl 'jJlVri Y r > r, 1 1 1 i ft. V a. It J-JjMxvX JjEvI'IL' V JJ JL A IVVjVjII^O J. JJivtS.i' wlNrlXIijrilJ J. Mil 


603 






C+S SI YYGTGTQKA++EL VI P+AKILRMDVDTTR+K H++IL FG EADILL 




Sbjct: 


551 


ECHSNSIRYYGTGTQKAFDELQGVIPEAKILRMDVDTTRKKRSHKTILDSFGRQEADILL 


610 


Queiry: 




\3 1 yi v l j.rii\.oJ-JiJir iriN v .1. .1 i\fi 7 V I i<N" 1 J i oJ_u.Ni-JirJjr rvooxir^. i. r j. ^ vrtui\^oi%xi^i\J2ioEi vviy 


663 






GTQMIAKGLDFPNVTLVGVUJADTSLNLPDFR+SE+TFQLLTQVAGRAGRA K GEV+IQ 




Sb j ct : 


611 


GTQMIAKGLDFPNWLVGVLNADTSLNLPDFRASEKTFQLLTQVAGRAGRAHKPGEVLIQ 


670 


Query: 


CCA 


T'VTvT'D'N'nJVTi T AT.7A^ir^r^TTT77i TTVOVT^IvrMTDD AT AVDDVVPT^/fTlT/rT.QWTn^PFWT.TPTf QVF\H\ 
1 XjNiriNlni/iXyijiiyiMJlJr Ei/ir ±ijj Idl'llN J.KKyjLioXirr' i ir J. VoJj 1 ±jonzNJJiiJiVv±j±.t\.iVo ijivli 


723 






TYNP+HYAIQIxA+KQDFEAFY+YEM+IR Q+ YPPYYFTVG+TLSH+ E +++K+Y+V 




Sbj ct : 


671 


TYNPDHYAI QliAKKQDFEAFYRYEMS IRHQMAYPPYYFTVGI TLSHRLEAS WKKAYQVT 


730 


Query: 


724 


SLLKQGFSDKVKLLGPTPKPIARTHNLYHYQI I IKYRFEDNLELVLNRLLDMTQDKENRD 


783 






LLK SD +K+LGPTPKPIARTHNLYHYQI++KYRFEDNLE LNR+LD +Q+ +NR 




Sbj ct : 


731 


ELLKSHLSDNIKILGPTPKPIARTHNLYHYQILLKYRFEDNLEETIiNRILDWSQEADNRH 


790 


Query: 


784 


LRIiAIDHEPQNMM 796 








L+L ID EPQ + 




Sbj ct : 


791 


LKLI IDCEPQQFL 803 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 984 

A DNA sequence (GBSxl044) was identified in S.agalactiae <SEQ ID 301 1> which encodes the amino 
acid sequence <SEQ ID 3012>. This protein is predicted to be methionyl-tRNA formyltransferase (fort). 
Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1329 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13446 GB:Z99112 methionyl-tRNA formyltransferase [Bacillus subtilis] 
Identities = 155/314 (49%) , Positives = 221/314 (70%) , Gaps = 7/314 (2%) 

Query: 1 MTKLLFMGTPDFSATVLKGILADGKYDVjSAVOTQPDRA 60 

MT+++FMGTPDFS VL+ ++ DG Y+V+ WTQPDR GRKK + PVKE AL + IP 
Sbjct: 1 MTRIVFMGTPDFSVPVIjRTLIEDG-YEVVGvVTQPDRPKGRKKVLTPPPVKEEALRHGIP 59 

Query: 61 VYQPEKLSGSPELEQLMTLGADGIVTAAFGQFLPTKLLESVGFA- INVHASLLPKYRGGA 119 

V QPEK+ + E+E+++ L D IVTAAFGQ LP +LL+S + INVHASLLP+ RGGA 
Sbjct: 60 VLQPEKVRLTEEIEKVLALKPDLIVTAAFGQILPKELj^SPKYGCINVHASLLPELRGGA 119 

Query: 120 PIHYAIINGEKEAGOTIMEMVAK^AGDMVSKASVEITDEDNVGTMFDRIAWGRDLLLD 179 
PIHY+I+ G+K+ G+TIM MV K+DAGDM+SK V+I + DNVGT+ D+L+V G LL + 
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Sbjct: 120 PIHYSILQGKKKTGITIMYMVEKLDAGDMISKVEVDIEETDNVGTLHDKLSVAGAKLLSE 179 

TLPGYLSGDIKPIPQNEEEVSFSPNISPDEERIDWNKSSRDIFNHVRGMYPWPVAHTLLE 239 
T+p ++G I P Q+EE+ +++PNI ++E +DW+++ +++N +RG+ PWPVA+T L 



K++ + + PG V+A K + VATG + A+ L +QPAGK RM +DF+ 



++E GD G 



Sb j ct : 


120 


Query: 


180 


Sb j ct : 


180 


Query: 


240 


Sb j ct : 


240 


Query: 


297 


Sbjct: 


300 



15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3013> which encodes the amino acid 
sequence <SEQ ID 3014>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 0730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 217/310 (70%) , Positives = 266/310 (85%) 

MTKLLFMGTPDFSATVLKGILADGKYDVLAVVTQPDRAVGRKKEIKMTPVKEVALFJWNIP 6 0 
M KLLFMGTP FSATVLKG+L + Y++L WTQPDRAVGRKK+IK+TPVK++ALE+ I 
MIKLLFMGTPQFSATVLKGLLDNPAYEILGVOTQPDRAVGRKKDIKVTPVKQLALEHGIS 6 0 

VYQPEKLSGSPELEQLMTLGADGI VTAAFGQFLPTKLLESVGFAINVHASLLPKYRGGAP 120 
+YQPEKLSGS EL ++M LGADGI +TAAFGQFLPT LL+SV FAINVHASLLPKYRGGAP 



IHYAI+NG+KEAGVTIMEM+ +MDAGDMV+KAS I + DNVGT+F++LA++GRDLLLD+ 
IHYAIMNGDKEAGVTIMEMIKEMDAGDMVAKASTPILETDNVGTLFEKLAI IGRDLLLDS 180 

LPGYLSGDIKPIPQNEEEVSFSPNISPDEERIDWNKSSRDIFNHVRGMYPWPVAHTLLEG 240 
LP YLSG++KPIPQ+ + +FSPNISP+ E++DW S++++FNH+RGM PWPVAHT LEG 
LPAYLSGELKPIPQDHSQATFSPNISPEHEKLDWTMSNQEVFNHIRGMNPWPVAHTFLEG 240 

NRFKLYEVTMSEGKGSPGQVIAKTKNSLTVATGDGAIELKSVQPAGKPRMDIKDFLNGVG 300 

R K+YE ++EG+G PGQV+ KTK SL +ATG GA+ L VQPAGKP+M I DFLNG+G 
QRLKIYEAQLAEGEGLPGQWVKTKKSLVIATGQGALSLIWQPAGKPKMSIIDFLNGIG 300 

RNLEIGDKFG 310 
R LE+GD G 
RKLEVGDIIG 310 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 985 

55 A DNA sequence (GBSxl045) was identified in S.agalactiae <SEQ ID 3015> which encodes the amino 
acid sequence <SEQ ID 3016>. This protein is predicted to be sunL protein (sun). Analysis of this protein 
sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 





Query: 


1 


30 


Sb j ct : 


1 




Query: 


61 


35 


Sb j ct : 


61 




Query: 


121 




Sb j ct : 


121 


40 


Query: 


181 




Sb j ct : 


181 


45 


Query: 


241 




Sb j ct : 


241 




Query: 


301 


50 


Sb j ct : 


301 
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Final Results 

bacterial cytoplasm Certainty=0 . 1677 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10711 GB:AJ132604 sunL protein [Lactococcus lactis] 
Identities = 222/434 (51%), Positives = 305/434 (70%), Gaps = 15/434 (3%) 

Query: 7 KSARGLALMTLEEVFDKGAYSNIALNKSLKKSRLSDKDRALOTEIWGTVARKITLEWYL 66 

K+AR AL L ++F AY+NI+L+++L+ S LS D+ VT +VYG V++K LEWY+ 
Sbjct: 3 KNARQTALDVLNDIFGNDAYANISLDRNLRDSELSTVDKGFVTALVYGVVSKKALLEWYI 62 

15 Query: 67 SHFIVDRDKLELWVYHLLLLSLYQLLYLDNIPDHAIVNDAVTIAKNRGNKKGAEKLINAV 126 

+ + K W LLLL++YQ+L++D +P A V++AV IAK R + + INAV 
Sbjct: 63 TPLLKKEPKP- -WAKMLLLLTIYQVLFMDKVPISAAVDEAVKIAK-RHDGQATANFINAV 119 

Query: 127 LRR-VSSETLPEIASIKRQNKRYSVAYSMPVWLVKKLIDQYGETRALAIMESLFERNKAS 185 
20 LR + SE E + K + YSMP L+ K++ Q+G R I+ESL + + S 

Sbjct: 120 LRNFMRSEHRNE EPKDWETKYSMPKLLLDKMVRQFGGKRTGEILESLEKPSHVS 173 

Query: 186 LRVTDLSQKQTIKETLNVRDSHIAETALVADSGNFASTSFFQDGLITIQDESSQLVAPTIi 245 
LR D + E R S + ETAL+ADSGNF+ T FQ G ITIQDE+SQLVAP L 

25 Sbjct: 174 LRKIDPTV EIAGTRPSLLTETALIADSGNFSITEEFQTGRITIQDETSQLVAPQL 228 

Query: 246 KVSGNDQ VLDACSAPGGKTSH I AS YLTTGAVTALDLYDHKLEL VMENAKRLGLSDKI KTK 305 

++ G + + VLDAC+APGGK+ +H+A YLTTG +TALDLY+HKL+L+ +NA+R ++DK1 T+ 
Sbjct: 229 ELEGTEEVLDACAAPGGKSTHMAQYLTTGHITALDLYEHKLDLINQNAQRQHVADKITTQ 288 

30 

. Query: 306 KLDASKAHEYFLEDTFDKILVDAPCSGIGLIRRKPDIKYNKANQDFEALQEIQLSILSSV 365 
K DA+ +E F + FD+ILVDAPCSGIGLIRRKPDI+Y K + DF LQ+IQL IL+S 
Sbjct: 289 KADATMIYENFGPEKFDRILVDAPCSGIGLIRRKPDIRYRKESSDFIDLQKIQLEILNSA 348 

35 Query: 366 CQTLRKGGIITYSTCTIFEEENFQVIEKFLENHPNFEQVELSHTQEDIVKRGCISISPEQ 425 

++L+K GI+ YSTCTIF+EENF V+ +FLENHPNFEQVE+S+ + +++K GC+ I+PE 
Sbjct: 349 SKSLKKSGIMVYSTCTIFDEENFDWHEFLENHPNFEQVEISNEKPEVIKEGCLFITPEM 408 

Query: 426 YHTDGFFIGQVKRI 439 
40 YHTDGFFI + K+I 

Sbjct: 409 YHTDGFF I AKFKKI 422 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3017> which encodes the amino acid 
sequence <SEQ ID 301 8>. Analysis of this protein sequence reveals the following: 

45 Possible site: 42 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA10711 GB:AJ132604 sunL protein [Lactococcus lactis] 
55 Identities = 208/433 (48%) , Positives = 287/433 (66%) , Gaps = 13/433 (3%) 

Query: 7 KSTRGKALLVIEAIFDQGAYTNIALNQQLSNKALSAKDRALLTEIVYGTVSRKISLEWYL 66 

K+ R AL V+ IF AY NI+L++ L + LS D+ +T +VYG VS+K LEWY+ 
Sbjct: 3 KNARQTALDVLNDIFGNDAYANISLDRNLRDSELSTVDKGFVTALWGWSKKALLEWYI 62 



60 



Query: 67 AHYVKERDKLDKWVYYLLMLSLYQLTYLDKLPAHAIVNDAVGIAKNRGNKKGAEKFVNAI 126 

+K K W LL+L++YQ+ ++DK+P A V++AV IAK R + + F+NA+ 
Sbjct: 63 TPLLKKEPK- -PWAKMLLLLTIYQVLFMDKVPISAAVDEAVKIAK-RHDGQATANFINAV 119 
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Query: 127 LRQFTSHPLPDMETIKRRNKYYSVKYSLPVWLVKKLEDQFGSDRSVAIMESLFVRSKASI 186 

LR F E K + KYS+P L+ K+ QFG R+ I+ESL S S+ 

Sbjct: 120 LRNFMRS EHRNEEPKDWETKYSMPKLLLDKMVRQFGGKRTGEILESLEKPSHVSL 174 

5 Query: 187 RVTDPLKLEEVAEALDAERSLLSATGLTKASGHFAASDYFTNGDITIQDESSQLVAPTIiN 246 
R DP E SLL+ T L SG+F+ ++ F G ITIQDE+SQLVAP L 

Sbjct: 175 RKIDP TVEIAGTRPSLLTETALIADSGNFSITEEFQTGRITIQDETSQLVAPQLE 229 

Query: 247 IDGDDIILDACSAPGGKTSHIASYLKTGBCVIALDLYDHKLELVKENANRLGVADNIETRK 306 
10 ++G + +LDAC+APGGK+ +H+A YL TG + ALDLY+HKL+L+ +NA R VAD I T+K 

Sbjct: 230 LEGTEEVLDACAAPGGKSTHMAQYLTTGHITALDLYEHKLDLINQNAQRQHVADKITTQK 289 

Query: 307 LDAREVHRHFEKDSFDKILVDAPCSGIGLIRRKPDIKYNKESQGFNALQAIQLEILSSVC 366 
DA ++ +F + FD+ILVDAPCSGIGLIRRKPDI+Y KES F LQ IQLEIL+S 
15 Sbjct: 290 ADATMIYENFGPEKFDRILVDAPCSGIGLIRRKPDIRYRKESSDFIDLQKIQLEILNSAS 349 

Query: 367 QTLRKGGIITYSTCTIFDEENRQVIEAFLQSHPNFEQVKLNHTQADIVKDGYLIITPEQY 426 

++L+K GI+ YSTCT I FDEEN V+ FL++HPNFEQV++++ + +++K+G L ITPE Y 
Sbjct: 350 KSLKKSGIMVYSTCTIFDEENFDVVHEFLENHPNFEQVEISNEKPEVIKEGCLFITPEMY 409 

20 

Query: 427 QTDGFFIGQVRRV 439 

TDGFFI + +++ 
Sbjct: 410 HTDGFFIAKFKKI 422 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 305/440 (69%) , Positives = 370/440 (83%) 

Query: 1 MANDWKKSARGLALMTLEEVFDKGAYSNIAI^KSLKKSRLSDKDRALvTEIvYGTVARKI 60 
+A++WKKS RG AL+ +E +FD+GAY+NIALN+ L LS KDRAL+TEI VYGTV+RKI 
30 Sbjct: 1 IJmNWKKSTRGKALLVIEAIFDQGAYTNIAl^^ 60 

Query: 61 TLEWYLSHFIVDRDKLELWVYHLLLLSLYQLLYLDNIPDHAIVM3AOTIAKNRGNKKGAE 120 

+LEWYL+H++ DRDKL+ WVY+LL+LSLYQL YLD +P HAIVNDAV IAKNRGNKKGAE 
Sbjct: 61 SLEWYLAHYVKDRDKLDKWVYYLLMLSLYQLTYLDKLPAHAIVNDAVGIAKNRGNKKGAE 120 

35 

Query: 121 KLINAVLRRVSSETLPEIASIKRQNKRYSVAYSMPVWLVKKLIDQYGETRALAIMESLFE 180 

K +NA+LR+ +S LP++ +IKR+NK YSV YS+PVWLVKKL DQ+G R++AIMESLF 
Sbjct: 121 KFVNAILRQFTSHPLPDMETIKRRNKY!fSVKYSLPVWLVKKLEDQFGSDRSVAIMESLFV 180 

40 Query: 181 RNKASLRVTDLSQKQTIKETLNVRDSHIAETALVADSGNFASTSFFQDGLITIQDESSQL 240 

R+KAS+RVTD + + + EL+ S++TL SG+FA++ +F +G ITIQDESSQL 
Sbjct: 181 RSKASIRVTDPLKLEEVAEALDAERSLLSATGLTKASGHFAASDYFTNGDITIQDESSQL 240 

Query: 241 VAPTLKVSGNDQVLDACSAPGGKTSHIASYLTTGAVTALDLYDHKLELvMENAKRLGLSD 300 
45 VAPTL + G+D +LDACSAPGGKTSHIASYL TG V ALDLYDHKLELV ENA RLG++D 

Sbjct: 241 VAPTLNIDGDDIILDACSAPGGKTSHIASYLKTGKVIALDLYDHKLELVKENANRLGVAD 300 

Query: 301 KIKTKKLDASKAHEYFLEDTFDKILVDAPCSGIGI.IRRKPDIKYNKANQDFEALQEIQDS 360 
I+T+KLDA + H +F +D+FDKILVDAPCSG1GLIRRKPDIKYNK +Q F ALQ IQL 
50 Sbjct: 301 NIETRKLDAREVHRHFEKDS FDKI L VDAPCSGIGIiIRRKPD I KYNKESQGFNALQAIQLE 360 

Query: 361 ILSSVCQTLRKGGIITYSTCTIFEEENFQVIEKFLENHPNFEQVELSHTQEDIVKRGCIS 420 

ILSSVCQTLRKGGIITYSTCTIF+EEN QVIE FL++HPNFEQV+L+HTQ DIVK G + 
Sbjct: 361 ILSSVCQTLRKGG1ITYSTCTIFDEENRQVIEAFLQSHPNFEQVKLNHTQADIVKDGYLI 420 



55 



Query: 421 ISPEQYHTDGFFIGQVKRIL 440 

I+PEQY TDGFFIGQV+R+L 
Sbjct: 421 ITPEQYQTDGFFIGQVRRvTj 440 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 986 

A DNA sequence (GBSxl046) was identified in S.agalactiae <SEQ ID 3019> which encodes the amino 
acid sequence <SEQ ID 3020>. This protein is predicted to be pppL protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5796 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10712 GB:AJ132604 pppL protein [Lactococcus lactis] 
Identities = 131/245 (53%) , Positives = 177/245 (71%) , Gaps = 4/245 (1%) 



Query: 


1 


MEISLLTDIGQRRSNNQDFINQFENKAGVPLIILADGMGGHRAGNIASEMTVTDLGSDWA 


60 






ME S+L+DIG +RS NQD++ + N+AG L +LADGMGGH+AGN+AS++TV DLG W+ 




Sbjct: 


1 


MEYSILSDIGSKRSTNQDWGTYVNRAGYQLFLIAIX3MGGHKAGNVASKLTVEDLGKLWS 


60 


Query: 


61 


ETDF SELSEIRDWMLVSIETENRKIYELGQSDDYKGMGTTIEAVAIVGDNIIFAHVG 


117 






ET F + + + W+ + EN I LG+ D+Y+GMGTT+EA+ I G+ 1+ AHVG 




Sb j ct : 


61 


ETFFDAGTPEATLEIWLRNQVRNENENIASLGKLDEYQGMGTTLEALVIKGNTIVSAHVG 


120 


Query: 


118 


DSRIGITOQGEYHLLTSDHSLVNELVKAGQIiTEEEAASHPQKNIITQSIGQANPVEPDLG 


177 






DSR ++R GE + +T+DHSLV ELV AGQ+TEEEA HP KNIIT+S+GQ N V+ D+ 




Sb j ct : 


121 


DSRTYIJlRDGEI^KITTDHSLVQEIiVnAGQITEEEAEvHPNKNIITRSLGQTOEVQADIQ 


180 


Query: 


178 


VHLLEEGDYLVWSDGLTNMLSNaDIATVLTQEK-TLDDKNQDLITLANHRGGLDNITVA 


236 






L+ GD +++NSDGLTNM+S +1 VL +E TLD+K++ LI LAN GGLDNITV 




Sb j ct : 


181 


ALELQAGDIILMNSDGLTNMVSTTEIMEvLEREDLTIiDNKSEALIRLftNEHGGIjIJNITVV 


240 


Query: 


237 


LVYVE 241 








L+ E 




Sb j ct : 


241 


LIKFE 245 





A related DNA sequence was identified in S.pyogenes <SEQ ID 302 1> which encodes the amino acid 
sequence <SEQ ID 3022>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5301 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 180/245 (73%) , Positives = 220/245 (89%) 

Query: 1 MEISLLTDIGQRRSNNQDFINQFENKAGVPLIIIjADGMGGHRAGNIASEMTVTDLGSDWA 60 

M+ISL TDIGQ+RSNNQDFIN+F+NK G+ L+ILADGMGGHRAGNIASEMTVTDLG +W 
Sbjct: 1 MKISLKTDIGQKRSNNQDFINKFDNKKGITLVILADGMGGHRAGNIASEMTVTDLGREWV 60 

Query: 61 ETDFSELSEIRDWMLVS IETENRKI YELGQSDDYKGMGTTIEAVAIVGDNI I FAHVGDSR 120 

+TDF+ELS+IRDW+ + I ++EN+ + I Y+LGQS+D+KGMGTT+EAVA+V + I+AH+GDSR 
Sbjct: 61 KTDFTELSQIRDWLFETIQSENQRIYDLGQSEDFKGMGTTVEAVALVESSAIYAHIGDSR 120 



Query: 121 IGIVRQGEYHLLTSDHSLvNELVKAGQLTEEEAASHPQKNIITQSIGQANPVEPDLGVHL 180 

IG+V G Y LLTSDHSLVNELVKAGQ+TEEEAASHPQ+NIITQSIGQA+PvEPDLGV + 
Sbjct: 121 IGLVHDGHYTLLTSDHSLVNELVKAGQITEEEAASHPQRNIITQSIGQASPVEPDLGVRV 180 
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Query: 181 LEEGDYLVVNSDGLTNMLSNADIATVLTQEKTLDDKNQDLITLftNHRGGLDNITVALVYV 240 

LE GDYLV+NSDGLTNM+SN +1 T+L + +LD+KNQ++I LAN RGGLDNIT+ALV+ 
Sbjct: 181 LEPGDYLVINSDGLTNMISNDEIVTILGSKVSLDEKNQEMIDLANLRGGLDNITIALVHN 240 

5 Query: 241 ESEAV 245 

ESE V 
Sbjct: 241 ESEDV 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 987 

A DNA sequence (GBSxl047) was identified in S.agalactiae <SEQ ID 3023> which encodes the amino 
acid sequence <SEQ ID 3024>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
15 >>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood =-10.03 Transmembrane 346 - 362 ( 340 - 372) 

Final Results 

bacterial membrane Certainty=0 . 5012 (Affirmative) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9539> which encodes amino acid sequence <SEQ ID 9540> 
was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA10713 GB:AJ132604 hypothetical protein [Lactococcus lactis] 
Identities = 219/380 (57%) , Positives = 284/380 (74%) , Gaps = 8/380 (2%) 

Query: 1 MIQIGKLFAGRYRILKSIGRGGMADVYLARDLILDNEEVAIKVLRTNYQTDQIAVARFQR 60 
30 MIQIGK+FA RYRI+K IGRGGMA+VY D L + +VAI KVLR+N+ + D IA+ARFQR 

Sbjct: 1 MIQIGKIFADRYRIIKEIGRGGMANVYQGEDTFLGDRKVAIKVLRSNFENDDIAIARFQR 60 

Query: 61 FARAMAELTHPNIVAIRDIGEEDGQQFLVMEYVDGFDLKKYIQDNAPLSNNEVVRIMNEV 120 
EA AMAEL+HPNIV I D+GE + QQ++VME+VDG LK+YI NAPL+N+E + 1+ E+ 
35 Sbjct: 61 EAFAMAELSHPNIVGISDVGEFESQQYIVMEFVDGMTLKQYINQNAPLANDEAIEIITEI 120 

Query: 121 LSAMSIAHQKGIVHRDLKPQNILLTKKGTVKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 180 

LSAM +AH GI+HRDLKPQN+L++ GTVKVTDFGIA A +ETSLTQTN+M GSVHYLS 
Sbjct: 121 LSAMDMAHSHGIIHRDLKPQNVLVSSSGTVKVTDFGIAKALSETSLTQTNTMFGSVHYLS 180 

40 

Query: 181 PEQARGSKATVQSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSILAENKSVP 240 

PEQARGS ATVQSDIYA+GI+LFE+LTG IP+DGDSAV IAL+HFQ+ +PSI+ N VP 
Sbjct: 181 PEQARGSNATVQSDIYAIGIILFELLTGQIPFDGDSAVAIALKHFQENIPSIINLNPEVP 240 

45 Query: 241 QALENIVIKATAKKLTDRYKTTYEMGRDLSTALSSTRHREPKLVFN-DTESTKTLPKVTS 299 

QALEN+VI KATAK + +RY EM D++T+ S R E KLVFN D + TK +P + 
Sbjct: 241 QALENWIKATAKDINNRYADVEEMMTDVATSTSLDRRGEEKLVFNKDHDETKIMP--AN 298 

Query: 300 TVSSLTTEQLLRNQKQAKTTEKITPDSASNDKTKSKKKASHRLLGTIMKLFFALCWGII 359 
50 ++ T+ L+ K+ EK +S++ + K+K K S + G 1+ L L V+G 

Sbjct: 299 LINPYDTKPLI - -DKKTDDQEKAQSESSTTENNKNKNKKSKK- -GLIISLWLLLVIGGG 354 

Query: 360 VFAYKILVSPTTIRVPDVSN 379 
FA+ + +PT ++VP+V+N 
55 Sbjct: 355 AFAWAV- STPTNVKVPNVTN 373 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3025> which encodes the amino acid 
sequence <SEQ ID 3026>. Analysis of this protein sequence reveals the following: 
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Possible site: 56 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.60 Transmembrane 349 - 365 { 340 - 370) 



5 Final Results 

bacterial membrane Certainty=0. 4439 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

1 0 The protein has homology with the following sequences in the databases: 

>GP:CAA10713 GB:AJ132604 hypothetical protein [Lactococcus lactis] 
Identities = 209/378 (55%) , Positives = 273/378 (71%) , Gaps = 8/378 (2%) 

Query: 1 MIQIGKLFAGRYRILKSIGRGGMADWIANDLILDNEDVAIKVLRTNyQTDQVAVARFQR 60 
15 MIQIGK+FA RYRI+K IGRGGMA+VY D L + VAIKVLR+N++ D +A+ARFQR 

Sbjct: 1 MIQIGKIFADRYRIIKEIGRGGMANVYQGEDTFLGDRKVAIKVLRSNFENDDIAIARFQR 60 

Query: 61 EARAMAELNHPNIVAIRDIGEEDGQQFLVMEYVDGADLKRYIQNHAPLSNNEVVRIMEEV 120 
EA AMAEL+HPNIV I D+GE + QQ++VME+VDG LK+YI +APL+N+E + 1+ E+ 
20 Sbjct: 61 EAFAMAELSHPNIVGISDVGEFESQQYIVMEFVDGMTLKQYINQNAPLANDEAIEIITEI 120 

Query: 121 LSAMTLAHQKGI VHRDLKPQNILLTKEGWKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 180 

LSAM +AH GI+HRDLKPQN+L++ G VKVTDFGIA A +ETSLTQTN+M GSVHYLS 
Sbjct: 121 LSAMDMAHSHGIIHRDLKPQNVLVSSSGTVKVTDFGIAKALSETSLTQTNTMFGSVHYLS 180 

25 

Query: 181 PEQARGSKATIQSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSIIEENHNVP 240 

PEQARGS AT+QSDIYA+GI+LFE+LTG IP+DGDSAV IAL+HFQ+ +PSII N VP 
Sbjct: 181 PEQARGSNATVQSDIYAIGIILFELLTGQIPFDGDSAVAIALKHFQENIPSIINLNPEVP 240 

30 Query: 241 C^VLENWIRATAKKLSDRYGSTPEMSRDIiMTALSYNRSRERKIIF-ENvESTKPLPKVAS 299 

QALENWI +ATAK +++RY EM D+ T+ S +R E K++F ++ + TK +P 
Sbjct: 241 QALENWI KATAKDINNRYADVEEMMTDVATSTSLDRRGEEKLVFNKDHDETKIMPANLI 300 

Query: 300 GPTASVKLSPPTPTvLTQESRLDQTNQTDALQPPTKKKKSGRFIiGTLFKILFSFFIVGVA 359 
35 P + L QE +++ T+ + KK K G + + +L ++G 

Sbjct: 301 NPYDTKPLIDKKTD- -DQEKAQSESSTTENNKNKNKKSKKGLIISLWLLL VIGGG 354 

Query: 360 LFTYLILTKPTSVKVPNV 377 
F + + T PT+VKVPNV 
40 Sbjct: 355 AFAWAVST-PTNVKVPNV 371 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 390/643 (60%), Positives = 480/643 (73%), Gaps = 29/643 (4%) 

45 Query: 1 MIQIGKLFAGRYRILKSIGRGGMADVYLARDLILDNEEVAIKVLRTNYQTDQIAVARFQR 60 

MIQIGKLFAGRYRILKS IGRGGMAD VYLA DLILDNE+VAI KVLRTNYQTDQ+AVARFQR 
Sbjct: 1 MIQIGKLFAGRYRILKSIGRGGMADWIANDLILDNEDVAIKVLRTNYQTDQVAVARFQR 60 

Query: 61 EaRAMAELTHPNIVAIRDIGEEDGQQFLVMEYVDGFDLKKYIQDNAPLSNNEVVRIMNEV 120 
50 EARAMAEL HPNI VAIRDIGEEDGQQFLVMEYVDG DLK+YIQ++APLSNNEWRIM EV 

Sbjct: 61 FARAMAEIOTPNIVAIRDIGEEDGTOFLVMEYVIX^LKRYIQNHAPLSNNEWRIMEEV 120 

Query: 121 LSAMSLAHQKGI VHRDLKPQNILLTKKGTVKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 180 
LSAM+LAHQKGIVHRDLKPQNILLTK+G VKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 
55 Sbjct: 121 LSAMTIAHQKGI VHRDLKPQNILLTKEGvVKVTDFGIAVAFAETSLTQTNSMLGSVHYLS 180 

Query: 181 PEQARGSKATVQSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSILAENKSVP 240 

PEQARGSKAT+QSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSI+ EN +VP 
Sbjct: 181 PEQARGSKATIQSDIYAMGIMLFEMLTGHIPYDGDSAVTIALQHFQKPLPSIIEENHNVP 240 



60 



Query: 241 QALENIVIKATAKKLTDRYKTTYEMGRDLSTALSSTRHREPKLVFNDTESTKTLPKVTS- 299 

QALEN+VI +ATAKKL+DRY +T+EM RDL TALS R RE K++F + ESTK LPKV S 
Sbjct: 241 QALENWI RATAKKLSDRYGSTFEMSRDLMTALSYNRSRERKIIFENVESTKPLPKVASG 300 



65 Query: 300 TVSSLTTEQLLRNQKQAKTTEKITPDSASNDKTKSKKKASHRLLGTIMKL 349 

T + LTEL Q T+ + P+ KKK S R LGT+ K+ 
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Sbjct: 301 PTASVKLSPPTPTVLTQESRL DQTNQTDALQPPT KKKKSGRFLGTLFKI 349 

Query: 350 FFALCWGIIVFAYKILVSPTTIRVPDVSNKTVAQAKMTLENSGLKVGAIRNIESDSVSE 409 

F+ +VG+ +F Y IL PT+++VP+V+ ++ AK L + GLKVG IR IESD+V+E 
Sbjct: 350 LFSFFIVGVALFTYLILTKPTSVKVPNVAGTSLKVAKQELYDVGLKVGKIRQIESDTVAE 409 

Query: 410 GLVVKTDPAAGRSRREGAKVNLYIATPNKSFTLGNYKEHNYKDILKDL-QGKGVKKSLIK 468 

G W+TDP AG ++R+G+ + LY++ NK F + NYK +Y++ + L + GV KS IK 
Sbjct: 410 GNVVRTDPKAGTAKRQGSSITLYVSIGNKGFDMENYKGLDYQEAMNSLIETYGVPKSKIK 469 

Query: 469 VKRKIMSTOYTTGTIIAQSLPEGTSFNPDGNKKLTLTVAVIffiPMIMPDVTGMTVGEVIETL 528 

++R + N+Y T+++QS G FNP+G K+TL+VAV+D + MP VT + + + TL 
Sbjct: 470 IERIVTI^YPENWISQSPSAGDKFNPigGKSKITLSVAVSDTITMPMVTEYSYADAVNTL 529 

Query: 529 TDLGLDADNLVFYQMQNGV- - -YQTWTPPSSSKIASQDPYYGGEVGLRRGDKVKLYLLG 585 

T LG+DA +Y + +++PS+++Q PYYG + h ++ LYL 

Sbjct: 530 TALGIDASRIKAYVPSSSSATGFVPIHSPSSKAIVSGQSPYYGTSLSLSDKGEISLYLYP 589 

Query: 586 SKTTNNSSSTPIDSSASSSTGTTTSDSVSSSTDASTSDSSSTS 628 

+T ++SSS+ SS SSS ++ +DS + ++ S S +TS 
Sbjct: 590 EETHSSSSSS SSTSSSNSSS INDSTAPGSNTELSPSETTS 629 

SEQ ID 3024 (GBS297) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 6; MW 75kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 27 (lane 4; MW 100.2kDa) and in 
Figure 159 (lane 2-4; MW lOOkDa). GBS297-GST was purified as shown in Figure 223, lane 3. GBS297- 
His was purified as shown in Figure 203, lane 8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 988 

A DNA sequence (GBSxl048) was identified in S.agalactiae <SEQ ID 3027> which encodes the amino 
acid sequence <SEQ ID 3028>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.91 Transmembrane 60 - 76 ( 50 - 90) 
INTEGRAL Likelihood = -7.43 Transmembrane 7 - 23 ( 3-25) 
INTEGRAL Likelihood = -5.68 Transmembrane 27 - 43 ( 24 - 46) 



Final Results 

bacterial membrane Certainty=0. 4163 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB03323 GB:AB035448 hypothetical protein [Staphylococcus 
aureus] 

Identities = 53/230 (23%) , Positives = 104/230 (45%) , Gaps = 14/230 (6%) 

Query: 5 QFFLLVEAVVLvMGLMKILSDDWTSFIFIIiAL--ILLALRF-YNNDSRHNFLLTTSLLLL 61 

Q ++ A++++ I + F+ +L L +L+ + + Y + R LL+ 

Sbjct: 9 QMLIIFTALMIIANFYYIFFEK-IGFLLVLLLGCVLVYVGYLYFHKIRGLLAFWIGALLI 67 

Query: 62 FLIFMLNPY-IIAAWFAVLYVLINHFSQVKKKNRYALIQFKNHQLDVKTTRNQWLGTDQ 120 

+ N Y II VF +L ++ + K K A + +K +W G + 
'Sbjct: 68 AFTLLSNKYTIIILFVFLLLLIVRYLIHKFKPKKWATDEVMTSPSFIK QKWFGEQR 124 

Query: 121 HESDFYAFEDINIIRISGTDTIDLTNVIVSGQDNVIIIQKVFGDTKVLVPLDVAVKADIS 180 
Y +ED+ I G IDLT ++N + G +V++P++ + ++ 
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Sbjct: 125 TPVYVYKWEDVQIQHGIGDLHIDLTKAANIKENOTIVVEHILGKVQVILPVNYNINLHVA 184 

Query: 181 SVYGSVQYFDFEEYDLRNESIKLSQ--EEEYYLLKRVKLWNTIAGKVEV 228 

+ YGS Y++Y+N+I++ ++Y V+ V+T G VEV 
Sbjct: 185 AFYGST-YVNEKSYKVENNNIHTEEMMKPDNY TVNIYVSTFIGDVEV 230 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3029> which encodes the amino acid 
sequence <SEQ ID 3030>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.92 Transmembrane 44 - 60 ( 36 - 64) 

INTEGRAL Likelihood = -8.76 Transmembrane 69 - 85 ( 66 - 105) 

INTEGRAL Likelihood = -8.70 Transmembrane 24 - 40 ( 20 - 42) 

INTEGRAL Likelihood = -6.64 Transmembrane 88 - 104 ( 85 - 105) 

Final Results 

bacterial membrane Certainty=0. 4970 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB03323 GB:AB035448 hypothetical protein [Staphylococcus 
aureus] 

Identities = 41/187 (21%) , Positives = 85/187 (44%) , Gaps = 22/187 (11%) 

Query: 47 FILILVL--ILLALRF-YNQDSRNNFLLTVSLLFLFLIFMLNPYIIMAVLLGIvYIFINH 103 

F+L+L+L +L+ ++Y R +L+ +NYI+ ++++++ 

Sbjct: 33 FLLVLLLGCVLVYVGYLYFHKIRGLLAFWIGALLIAFTLLSNKYTIIILFVFLLLLIV-- 90 

Query: 104 FSQVKKKNRFALIRFKEEKIEVNNT KHQWIGTANYESDYYCFDDINIIRISG 155 

R+ + +FK +K+ + K +W G Y ++D+ I G 

Sbjct: 91 -RYLIHKFKPKKVVATDEVMTSPSFIKQKWFGEQRTPVYVYKWEDVQIQHGIG 142 

Query: 156 NDTVDLTNVI VTGMDNI IVIRKIFGNTTILVPIDVTVTLDVSSIYGSVDFFRCQQYDLRN 215 

+ +DLT +N 1V+R I G +++P++ + L V++ YGS + + Y + N 

Sbjct: 143 DLHIDLTKAANIKENNTIVVRHILGIWQVILPVNYNINLHVARFYGST-YVNEKSYKVEN 201 

Query: 216 ESIKFKE 222 

+ 1 +E 
Sbjct: 202 NNIHIEE 208 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 137/211 (64%) , Positives = 175/211 (82%) 

Query: 1 MKKFQFFLLVEAVVLVMGLMKILSDDWTSFIFILALILLALRFYNNDSRHNFLLTTSLLL 60 

MKKFQFFLL+E ++L MG+M IL +D +SFI IL LILLALRFYN DSR+NFLLT SLL 
Sbjct: 18 MKKFQFFLLIECILLAMGIMTILDNDLSSFILILVLILLALRFYNQDSRNNFLLTVSLLF 77 

Query: 61 LFLIFMLNPYIIAAWFAVLYVLINHFSQVKKKNRYALIQFKNHQLDVKTTRNQWLGTDQ 120 

LFLIFMLNPYI I AV+ ++Y+ INHFSQVKKKNR+ALI+FK +++V T++QW+GT 
Sbjct: 78 LFLI FMLNPYI IMAVLLGI VYI FINHFSQVKKKNRFALIRFKEEKIE VNNTKHQWIGTAN 137 

Query: 121 HESDFYAFEDINI IRI SGTDT I DLTNVI VSGQDNVI I IQKVFGDTKVLVPLDVAVKAD I S 180 

+ESD+Y F+DINIIRISG DT+DLTNVIV+G DN+I+I+K+FG+T +LVP+DV V D+S 
Sbjct: 138 YESDYYCFDDINIIRISGNDTVDLTNVIVTGMDNIIVIRKIFGNTTILVPIDVTVTLDVS 197 



Query: 181 SVYGSVQYFDFEEYDLRNESIKLSQEEEYYL 211 

S+YGSV +F ++YDLRNESIK + + L 
Sbjct: 198 S IYGS VDFFRCQQYDLRNES IKFKETDNQSL 228 



SEQ ID 3028 (GBS66) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 6 (lane 4; MW 25kDa) and in Figure 7 (lane 2; MW 24.7kDa). 



WO 02/34771 



PCT/GB01/04789 



-1093- 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 989 

A DNA sequence (GBSxl049) was identified in S.agalactiae <SEQ ID 3031> which encodes the amino 
acid sequence <SEQ ID 3032>. This protein is predicted to be histidine kinase (narQ). Analysis of this 
protein sequence reveals the following: 

Possible site: 19 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.41 Transmembrane 47 - 63 ( 40 - 72) 
INTEGRAL Likelihood = -9.98 Transmembrane 9- 25 ( 5- 36) 



Final Results 

bacterial membrane Certainty=0 . 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54570 GB:AJ006393 histidine kinase [Streptococcus pneumoniae] 
Identities = 159/334 (47%) , Positives = 239/334 (70%) , Gaps = 5/334 (1%) 

Query: 1 MKKHHYFLAFFYGSVIIFAICFVIIDSLGVNL-VHLYQTSRLWLIEQLIFSIFFLSLAVT 59 

MKK Y + + +F +++ L + + L+ + E+ +F + S+++T 
Sbjct: 1 MKKQAYVIIALTSFLFVFFFSHSLLEILDFDWSIFLHDVEKT EKFVFLLLVFSMSMT 57 

Query: 60 ILLLLTWFLLDDNSKRQINHNLRRIIOTQSINVTDrXSTEISTNIQRLSKKMNLMTASLQS 119 

LL L W +++ S R++ NL+R+L Q + D ++ + + LS K+NL+T +LQ 
Sbjct: 58 CLIiALFWRGIEELSLRKMQANLKRLLAGQEWQVAD-PDLDASPKSLSGKIJILLTFjALQK 116 

Query: 120 KENSRILKSQEIVKQERKRIARDLHDTVSQDLFAASMVLSGIAQNVSQLDVDQVGSQLLA 179 

EN + + +EI+++ERKRIARDLHDTVSQ+LFAR M+LSGI+Q +LD +++ +QL + 
Sbjct: 117 AENQSLAQEEEIIEKERKRIARDLHDTVSQELFAAHMILSGISQQALKLDREKMQTQLQS 176 

Query: 180 VEEMLQHAQNDLRILLLHLRPVELENKTLSEGFRMILKELTDKSDIEWYHESILTLPKK 239 

V +L+ AQ DLR+LLLHLRPVELE K+L EG +++LKEL DKSD+ V +++ LPKK 
Sbjct: 177 VTAILETAQKDLRVLLLHLRPVELEQKSLIEGIQILLKELEDKSDLRVSLKQNMTKLPKK 236 

Query: 240 IEDNIFRIGQEFISNTLKHSQASRLEVYLNQTENELQLKMIDNGIGFDMDSVYDLSYGLK 299 

IE++IFRI QE I SNTL+H+QAS L+VYL QT+ ELQLK++DNGIGF + S+ DLSYGL+ 
Sbjct: 237 IEEHIFRILQELISNTLRHAQASCLDVYLYQTDVELQLKWDNGIGFQLGSLDDLSYGLR 296 

Query: 300 NIEDRVEDLAGNLQLLSQPGKGVAMDIRLPLVNQ 333 

NI++RVED+AG +QLL+ P +G+A+DIR+PL+++ 
Sbjct: 297 NlKERVEDMAGTVQLLTAPKQGIiAVDIRIPLLDK 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2991> which encodes the amino acid 
sequence <SEQ ID 2992>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-14.22 Transmembrane 49 - 65 ( 42 - 70) 
INTEGRAL Likelihood = -6.58 Transmembrane 8 - 24 ( 5 - 33) 



Final Results 

bacterial membrane Certainty=0. 6689 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/337 (64%), Positives = 276/337 (81%), Gaps = 3/337 (0%) 
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Query: 


1 


MKKHHYFIAFFYGSVIIFAICFVIIDSLGVNLTOLYQTSRLWLIEQLIFSIFFLSLAVTI 


60 






MKK +Y L + Y ++ I +1 FV++D+LG+ +L + LW +E+L FSI L ++VT+ 




Sbjct: 


1 


MKKRYYALVWLYSTITILSIVFVVMDNLGITFNYL--RNHLWQVERLGFSILLLIVSVTL 


58 


Query: 


61 


LLLLTWFLLDDNSKRQINHNLRRILNNQSINVTDIX3TEISTNIQRL^ 


120 






LLLL W ++DDNSKR IN NL+ ILNN+ + + D+ +EI+TN+ RLSKKM+ +TA++Q K 




Sbjct: 


59 


LLLLLWIIMDDNSKRNINQNLKYILNNRRLYL-DETSEINTNLSRLSKKMSHLTANMQKK 


117 


Query: 


121 


ENSRILKSQEIVKQERKRIARDLHDTVSQDLFARSMVLSGIAQISWSQLDVriQVGSQLIiAV 


180 






E++ IL SQE+VKQERKRIARDLHDTVSQ+LFA+S++LSGI+ ++ QLD Q+ +QL V 




Sbjct: 


118 


ESAYILDSQEWKQERKRIARDLHDTVSQELFASSLILSGISMSLEQLDKTQLQTQLTTV 


177 


Query: 


181 


EEMLQHAQNDLRILLLHLRPVELENKTLSEGFRMILKELTDKSDIEWYHESILTLPKKI 


240 






E MLQ+AQNDLRILLLHLRP EL N+TLSEG MILKELTDKSDIEV+Y E+I LPK + 




Sbjct: 


178 


EAMLQNAQNDLRILLLHLRPTELANRTLSEGLHMILKELTDKSDIEVIYKETIAQLPKTM 


237 


Query: 


241 


EDNIFRIGQEFISNTLKHSQASRLEVYLNQTENELQLKMIDNGIGFDMDSVYDLSYGLKN 


300 






EDN+FRI QEFISNTLKH++ASR+EVYLNQT ELQLKMID+G+GFDMD V DLSYGLKN 




Sbjct: 


238 


EDNLFRIAQEFISNTLKHAKASRIEVYLNQTSTELQLKMIDDGVGFDMDQVRDLSYGLKN 


297 


Query: 


301 


IEDRVEDLAGNLQLLSQPGKGVAMDIRLPLVNQSEDK 337 








IEDRV DLAGNL L+SQ GKGV+MDIRLP+V +D+ 




Sbjct: 


298 


IEDRVNDIAGNLHLISQKGKGVSMDIRLPIVKGDDDE 334 





A related GBS gene <SEQ ID 8701> and protein <SEQ ID 8702> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 
McG: Discrim Score: 14.69 
GvH: Signal Score (-7.5): -4.31 

Possible site: 19 
>>> Seems to have an uncleavable '. 
ALOM program count: 2 value: -1 

INTEGRAL Likelihood =-11.41 

INTEGRAL Likelihood = -9.98 

PERIPHERAL Likelihood = 3.61 
modified ALOM score: 2.78 

*** Reasoning Step: 3 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 

The protein has homology with the following sequences in the databases: 

52.5/77.6% over 288aa 

Streptococcus pneumoniae 

GP| 5830526 | histidine kinase Insert characterized 
ORF00320(433 - 1302 of 1617) 

GP|5830526|emb|CAB54570.l| |AJ006393(43 - 331 of 331) histidine kinase {Streptococcus 
pneumoniae} 
%Match =28.6 

%Identity =52.4 %Similarity =77.6 

Matches = 152 Mismatches = 64 Conservative Sub.s = 73 

252 282 312 342 372 402 432 462 

QEEEYTF*NVSN*L*TLSLES*G*S*MKKHHYFLAFFYGSVIIFAICFVIIDSLGVNLVHLYQTSRLWLIEQLIFSIFFL 

= :| I ::: : |:::| :: : 

MKKQAYVT IALTSFLFVFFFSHSLLEILDFDWS I FLHDVEKTEKFVFLLLVF 
10 20 30 40 50 

492 522 552 582 612 642 672 702 

SLAVTILLLLTWFLLDDNSKRQINHNLRRIIiNNQSINvTDDGTEISTNIQRLSKKMNLMTASLQSKENSRILKS 



N-term signal seq 

1.41 threshold: 0.0 

Transmembrane 47 - 63 ( 40 - 72) 
Transmembrane 9 - 25 ( 5 - 36) 
146 



Certainty=0 . 5564 (Affirmative) < suco 

Certainty=0 . 0000 (Not Clear) < suco 

Certainty=0. 0000 (Not Clear) < suco 
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SMSMTCLLALFWRGIEELSLRKMQANLKRLIAGQEWQVAD- 

70 80 90 100 110 120 130 

732 762 792 822 852 882 912 942 

ERKRIARDLHDWSQDLFAASMVLSGIAQOTSQLDVDQVGSQLLAVEEMLQHAQNDLRILLLHLRPVELENKTLSEGFRM 

IllllllllllllihIIII hlllhl :|| = = = =11 =1 = 1 = II I I h I I I I I I I I I I I hi II = = 
ERKRIARDLHDWSQELFAAHMILSGISQQALKLDREKMQTQLQSVTAILETAQKDLRVLLLHLRPVELEQKSLIEGIQI 
150 160 170 180 190 200 210 



972 1002 1032 1062 1092 1122 1152 1182 

ILKELTDKSDIEVWHESILTLPKKIEDNIFRIGQEFISOTLKHSQASRLEVYLNQTENELQLKMIDNGIGFDMDSVYDL 

:|||| l|||: I : = = hlllhhlll Ihllllhhlll hill Ih I I I I I :: I I I I I I : h II 
LLKELEDKSDLRVSLKQNMTKLPKKIEEHIFRILQELISNTLRHAQASCLDVYLYQTDVELQLKVVDNGIGFQLGSLDDL 
15 230 240 250 260 270 280 290 

1212 1242 1272 1302 1332 1362 1392 1422 

SYGLKNIXDRVEDIAGNLQLLSQPGKGVAMDIRLPLVNQSEDKNG*NKNCTC**P*DGSSRFKKFFKLTS*C*SNR*GLK 

lllhll :|llhll =111= I :hh||hlh = = 
20 SYGLRNI KERVEDMAGTVQLLTAPKQGLAVD IRI PLLDKE 

310 320 330 

SEQ ID 8702 (GBS31) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 15 (lane 8; MW 64kDa). It was also expressed as GBS31d in E.coli as a GST- 
25 fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 151 (lane 8-10; MW 59kDa) and 
in Figure 187 (lane 8; MW 59kDa). GBS31d was also expressed in E.coli as a His-fusion product. SDS- 
PAGE analysis of total cell extract is shown in Figure 151 (lane 11-13; MW 34kDa) and in Figure 182 (lane 
11; MW 34kDa). Purified GBS31d-GST is shown in lane 3 of Figure 237. 

Example 990 

30 A DNA sequence (GBSxl050) was identified in S.agalactiae <SEQ ID 3033> which encodes the amino 
acid sequence <SEQ ID 3034>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2706 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB54571 GB:AJ006393 response regulator [Streptococcus pneumoniae] 
Identities = 154/209 (73%), Positives = 184/209 (87%) 

I KIVIjVDDHEMTOLGLKSFLNLQADvEVIGEASNGLEGIKKALELRPDvVVMDLVMPEMD 6 7 
45 +KI+LVDDHEMVRLGLKS+ +LQ DVEV+GEASNG +GI ALELRPDV+VMD+VMPEM+ 

MKILLVDDHEMVRLGLKSYFDLQDDVEWGEASNGSQGIDLALELRPDVI VMDIVMPEMN 6 0 



Query: 


8 


Sb j ct : 


1 


Query: 


68 


Sb j ct : 


61 


Query: 


128 


Sb j ct : 


121 


Query: 


188 


Sbjct: 


181 



G++ATLA+LK+WPEA IL++TSYLDNEKI PV++AGAKGYMLKTSSA E+L+A+ KV+ G 



E AIE EV KK++ H LHE LTARERD+L L+AKGY+NQRIAD+LFISLKTVKTHV 



SNIL KL V+DRTQA VYAFQHHLV Q++ 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2995> which encodes the amino acid 
sequence <SEQ ID 2996>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3094 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 175/212 (82%) , Positives = 192/212 (90%) 



Query: 


5 


MDKIKIVLVDDHEMVRLGLKSFLNLQADVEVIGEASNGLEGIKKALELRPDVVVMDLVMP 


64 






M KIK++LVDDHEMVR+GLKSFI1NLQAD++V+GEASNG EG+ AL L+PDV+VMDLVMP 




Sb j ct : 


3 


MSKIKVILVDDHEMV1MGLKSFLNLQADIDWGEASNGREGVDLAIALKPDVLVMDLVMP 


62 


Query: 


65 


EMDGVEATLALLKDWPFAAILVLTSYLDNEKIYPVIFAGAKGYMLKTSSAAEILNAIRKV 


124 






E+ GVEATL +LK W EA +LVLTS YLDNEKI YPVI +AGAKGYMLKTSSAAE I LNAI RKV 




Sb j ct : 


63 


ELGGVEATLEVLKKWKEAKVLVLTS YLDNEKI YPVIDAGAKGYMLKTSSAAE I LNAI RKV 


122 


Query: 


125 


SRGEQAIENEVDKKIKAHDKCPALHEGLTARERDILNLLAKGYDNQRIADELFISLKTVK 184 






S+GE AIE EVDKKI KAHD+ P LHE LTARE DIL+LLAKGYDNQ IADELF I SLKTVK 




Sbjct: 


123 


SKGELAIETEVDKKIKAHDQHPDLHEELTAREYDILHLIiAKGYDNQTIADELFISLKTVK 182 


Query: 


185 


THVSNI LGKLNVADRTQAWYAFQHHLVPQDD 216 








THVSNIL KL V DRTQAWYAF+HHLVPQDD 




Sb j ct : 


183 


THVSNILAKLEVGDRTQAWYAFRHHLVPQDD 214 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 991 

A DNA sequence (GBSxl051) was identified in S.agalactiae <SEQ ID 3035> which encodes the amino 
acid sequence <SEQ ID 3036>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1688 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB08166 GB:Z94864 putative peptidyl -prolyl cis-trans isomerase 
[Schizosaccharomyces pombe] 
Identities = 81/174 (46%) , Positives = 109/174 (62%) , Gaps = 30/174 (17%) 

Query: 288 IKTNHGDMTvKLFPDHAPKTVANFIGLAKQGYYDGIIFHRIIPDFMIQGGDPTGTGMGGE 347 

++T+ G + ++L+ +HAPKT NF LAK+GYYDG+IFHR+IPDF+IQGGDPTGTG GG 
Sbjct: 6 LQTSLGKILIELYTEHAPKTCQNFYTLAKEGYYDGVIFHRVIPDFVIQGGDPTGTGRGGT 65 

Query: 348 S I YGES FEDEFSEELYNV- RGALSMANAGPNTNGSQFFI VQNTKI PYAKKELERGGWPTP 406 

SIYG+ F+DE +L++ G LSMANAGPNTN SQFFI T P 
Sbjct: 66 SIYGDKFDDEIHSDLHHTGAGILSMANAGPNTNSSQFFI TLAP 108 

Query: 407 IAELYAGQGGTPHLDRRHSVFGQLVDQSSFEvLDEIAAVETGSQDKPLEDWII, 460 

TP LD +H++FG++V S V + + T S D+P+E + 1+ 
Sbjct: 109 TPWLDGKHTIFGRW- -SGLSVCKRMGLIRTDSSDRPIEPLKI I 150 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3037> which encodes the amino acid 
sequence <SEQ ID 3038>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2175 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 381/464 (82%) , Positives = 422/464 (90%) 

Query: 1 MDAKTKYKAKKIKAVFFDIDDTLRVKDTGYMPPSILKVFKALKDKGIWGIASGRARYGV 60 
15 MDAK KYKAKKIK VFFDIDDTLRVKDTGYMP SI +VFKALK KGI+VGIASGRARYGV 

Sbjct: 5 MDAKLKYKAKKIKMVFFDIDDTLRVKDTGYMPESIQRVFKALKAKGILVGIASGRARYGV 64 

Query: 61 PKEVQDLNADYOTKLNGAWKDKDKNIIFHRPIPAEYVEQYKKWADTVGIKYGLAGRHEA 120 
P+EVQDL+ADYCVKLNGAYVKD K IIF PIPA+ V YKKWAD +GI YG+AGRHEA 
20 Sbjct: 65 PQEVQDLHADYCVKmGAYVKDDAKTIIFQAPIPADVWAYKKWADDMGIFYGMAGRHEA 124 

Query: 121 VLSDRDDLVNDAIDIWSDLEVNPDFNKEHDIYQMWTFEDKGDSLHLPEPLAEHLRLIRW 180 

VLS R+D++++AID VY+ LEV PD+N+ HD+YQMWTFEDKGD L LP LAEHLRL+RW 
Sbjct: 125 VLSARNDMISNAIDNVYAQLEVCPDYNEYHDVYQMWTFEDKGDGLQLPAELAEHLRLVRW 184 

25 

Query: 181 HDHSSDVVLKGTSKALGVSKVVEHLGLKPENILVFGDELNDLELFDYAGLAVAMGVSHPE 240 

HD+SSDWLKGTSKALGVSKW+HLGLKPENILVFGDELNDLELFDYAG+++AMGVSHP 
Sbjct: 185 HDNSSDWLKGTSKALGVSKWDHLGLKPENILVFGDELNDLELFDYAGISIAMGVSHPL 244 

30 Query: 241 AQKKADFITKEWEEDGILYALEEI^LIEKELTFPQVDIENTEGPVAVIKTNHGDMTVKLF 300 

Q+KADFITKKVEEDGILYALEELGLI+KEL FPQ+D+ N +GP A IKTNHGDMT+ LF 
Sbjct: 245 LQEKADFITKKVEEDGILYALEELGLIDKELQFPQLDLPNHKGPKATIKTNHGDMTLVLF 304 

Query: 301 PDHAPKTVANFIGLAKQGYYDGIIFHRIIPDFMIQGGDPTGTGMGGESIYGESFEDEFSE 360 
35 PDRAPKTVANF+GLAK+GYYDGI I FHRI I P+FMIQGGDPTGTGM G+SIYGESFEDEFS+ 

Sbjct: 305 PDHAPKTVANFLGLAKEGYYDGI I FHRI I PEFMIQGGDPTGTGMCGQS I YGESFEDEFSD 364 

Query: 361 ELYNVRGALSMANAGPNTNGSQFFIVQNTKIPYAKKELERGGWPTPIAELYAGQGGTPHL 420 
ELYN+RGALSMANAGPNTNGSQFFIVQN+KIPYAKKELERGGWP PIA YA GGTPHL 
40 Sbjct: 365 ELYNLRGALSMANAGPNTNGSQFFIVQNSKIPYAKKELERGGWPAPIAASYAANGGTPHL 424 

Query: 421 DRRHS VFGQLVDQS S FEVLDE I AAVETGSQDKPLEDWILT IEV 464 

DRRH+VFGQLVD++SF+VLD IA VETG+QDKP EDV+I TIEV 
Sbjct: 425 DRRHTVFGQLVDETS FQVLDL IAGVETGAQDKPKEDVI IET IEV 468 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 992 

A DNA sequence (GBSxl052) was identified in S.agalactiae <SEQ ID 3039> which encodes the amino 
50 acid sequence <SEQ ID 3040>. This protein is predicted to be ribosomal protein SI (rpsA). Analysis of this 
protein sequence reveals the following: 

Possible site: 36 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 3126 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty^ 0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07066 GB:AP001518 polyribonucleotide nucleotidyltransferase 
(general stress protein 13) [Bacillus halodurans] 
Identities = 46/120 (38%) , Positives = 71/120 (58%) , Gaps = 11/120 (9%) 

5 

Query: 8 KIGDKLKGTVTGIRPYGAFVSLEDGRTGLIHISEIKTGYIDNIYDVLSVGDEVYVQVIDV 67 

++G ++G VTGI+P+GAFV+++D + GL+HISE+ G++ +1 DVLSVGDEV V+++ V 
Sbjct: 5 EVGSIVEGKOTGIKPFGAWAIDDQKQGIiTOISEVMGFVTOIlTOVliSVGDEVKVKILSV 64 

10 Query: 68 DEFTQKASLSLRTLEEERHHIQH RHRFSNNRLKIGFKPLEENLPSWVEE 116 

DE + K SLS+R +E R GF LE+ L W+++ 

Sbjct: 65 DEESGKISLSIRATQEAPERPARAPKPRPAGGGGRKPQKGQSQGQGFNTLEDKLKEWLKQ 124 . 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3041> which encodes the amino acid 
15 sequence <SEQ ID 3042>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1832 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 78/115 (67%) , Positives = 100/115 (86%) 

Query: 7 MKIGDKLKGTVTGIRPYGAFVSLEDGRTGLIHISEIKTGYIDNIYDVLSVGDEVYVQVID 66 

MKIGDKL GT+TGI+PYGAFV+LE+G TGLIHISEIKTG+ID+I +L++G++V VQVID 
Sbjct: 1 MKIGDKLHGTITGIKPYGAFVALENGTTGLIHISEIKTGFIDDIDQLLAIGNQVLVQVID 60 

30 

Query: 67 VDEFTQKASLSLRTLEEERHHIQHRHRFSNNRLKIGFKPLEENLPSWVEEGLAYL 121 

+DE+++K SLS+RTL EE+ H HRHR+SN+R KIGF+PLEE LP W+EE L +L 
Sbjct: 61 IDEYSKKPSLSMRTLAEEKQHFFHRHRYSNSRHKIGFRPLEEQLPQWIEESLQFL 115 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 993 

A DNA sequence (GBSxl053) was identified in S.agalactiae <SEQ ID 3043> which encodes the amino 
acid sequence <SEQ ID 3044>. This protein is predicted to be pyruvate formate-lyase 2 activating enzyme 
40 (pflA). Analysis of this protein sequence reveals the following: 

possible site: 41 

?>> Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0. 2889 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAC76934 GB:AE000469 probable pyruvate formate lyase activating 

enzyme 2 [Escherichia coli K12] 
Identities = 90/251 (35%) , Positives = 142/251 (55%) , Gaps = 16/251 (6%) 



Query. 8 VFNIQHFSIHDGPGIRTTVFLKGCPLRCPWCANPESQKMVPETMR 52 

55 +FNIQ +S++DG GIRT VF KGCP CPWCANPES +T+R 

Sbjct: 24 IFNIQRYSLNDGEGIRTWFFKGCPHLCPWCANPESISGKIQTVRREAKCLHCAKCLRDA 83 
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Query: 



Sbjct: 



53 



84 



- DAITNESVIVGEEKSVDDI IEEVLKDIDFYEESGGGITLSGGEI FAQFEFAKAILKRAK 111 

+ + +G + S+D + EV+KD F+ SGGG+TLSGGE+ Q EFA L+R + 

DECPSGAFERIGRDISLDALEREVMKDDIFFRTSGGGVTLSGGEVLMQAEFATRFLQRLR 143 



Query: 



112 



SLGIHTAIETTAYTRHEQFIDLIQYVDFIYTDLKHYNSLKHQEKTMVKNASIIKNIHYAF 171 
G+ AIET + + L + D + DLK ++ + ++ + +++N+ 



Sb j ct : 


144 


LWGVSCAIETAGDAPASKLLPIiAKLCDEVLFDLKIMDATQARDWKMNIjPRVLENLRLLV 


203 


Query: 


172 


ANGKTIVLRIPVIPNFNDSLEDAEEFACLFDRLDIRQVQLLPFHQFGQNKYQLIiNRQYEM 


231 






+ G ++ R+P+IP F S E+ ++ + L+IRQ+ LLPFHQ+G+ KY+LL + + M 




Sbjct: 


204 


SEGVIWIPRLPLIPGFTLSRENMQQALDVLIPLNIRQIHLLPFHQYGEPKYRLLGKTWSM 


263 


Query: 


232 


EEIAALHPEDL 242 








+E+ A D+ 




Sbjct: 


264 


KEVPAPSSADV 274 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3045> which encodes the amino acid 
sequence <SEQ ID 3046>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 187/255 (73%) , Positives = 220/255 (85%) 



Query: 


4 


EKGIVraiQHFSIHDGPGIRTWFLKBCPLRCPWCANPESQKMvPETMRDAITNESVIVG 


63 






++GIVFNIQHFSIHDGPGIRTTVFLKGCPLRCPWCANPESQ+ PE M + + IVG 




Sb j ct : 


3 


DRGIVFNIQHFSIHDGPGIRTTVFLKGCPIJiCPWCIANPESQQKAPEQMLTSDGLNTKlVG 


62 


Query: 


64 


EEKSVDDIIEEVLKDIDFYEESGGGITLSGGEIFAQFEFAKAILKRAKSLGIHTAIETTA 


123 






EEK+VD++IEEVLKD+DFYEESGGG+TLSGGEIFAQF+FA A+LK AK+ G+HTAIETTA 




Sb j ct : 


63 


EEKTVDEVIEEVLKDLDFYEESGGGMTLSGGEIFAQFDFALALLKAAKAAGLHTAIETTA 


122 


Query: 


124 


YTRHEQFIDLIQYVDFIYTDLKHYNSLKHQEKTMVKNASIIKNIHYAFANGKTIVLRIPV 


183 






+ +HEQF+ L+ YVDFIYTDLKHYN L+HQ+ T V+N IIKNIHYAF GK IVLRIPV 




Sb j ct : 


123 


FAKHEQFVTLVDYVDFIYTDLKHYNQLRHQKVTGVRNDLIIKNIHYAFQAGKEIVLRIPV 


182 


Query: 


184 


I PNFNDSLEDAEEFACLFDRLD I RQVQLLPFHQFGQNKYQLLNRQYEMEE I AALHPEDLL 


243 






IP FNDSL+DA+ F+ LF++L+I QVQLLPFHQFG+NKY+LL R+YEM E+ A HPEDL 




Sb j ct : 


183 


IPQFlTOSIiDDAKAFSELFNQLEIDQVQLLPFHQFGENKYKLLGREYEMAEVKAYHPEDLA 


242 


Query: 


244 


DYQAIFSKYNIHCYF 258 








DYQA+F +NIHCYF 




Sbjct: 


243 


DYQAVFLNHNIHCYF 257 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 994 

A DNA sequence (GBSxl054) was identified in S.agalactiae <SEQ ID 3047> which encodes the amino 
acid sequence <SEQ ID 3048>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 



Final Results 



bacterial cytoplasm Certainty=0 . 2209 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



Final Results 



bacterial cytoplasm --- Certainty=0. 1762 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9299> which encodes amino acid sequence <SEQ ID 9300> 
was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC74366 GB:AE000226 putative DEOR-type transcriptional 
regulator [Escherichia coli K12] 
Identities = 74/177 (41%) , Positives = 113/177 (63%) , Gaps = 1/177 (0%) 

10 Query: 2 NRLENI I SLVSQYQKID VNTLSELLQVSKVTIRKDLDKLEGKGLLHREHGYAVIjNSGDDL 61 

+R + 1+ +V ++ V L++ VS+VTIR+DL+ LE L R HG+AV DD+ 
Sbjct: 3 SRQQTILQMVIDQGQVSVTDLAKATGVSEVTIRQDLNTLEKLSYLRRAHGFAVSLDSDDV 62 

Query: 62 NTOLSFNHKTKKEIAAIAANMVSDNDTILIESGSTCALLAENICQTKRMVTILTNSCFIA 121 
15 R+ N+ K+E+A AA++V +TI IE+GS+ ALIA + + K+NVTT+T S +IA 

Sbjct: 63 ETRMMSNYTLKRELAEFAASLVQPGETIFIENGSSNALIARTLGEQKKNVTIITVSSYIA 122 

Query: 122 NYLREYDSCQI VLLGGEYQSSSQVTVGPLLKKMISLFHVSLAFVGTDGFDPKTRIYG 178 
+ L++ C+++LLGG YQ S+ VGPL ++ I H S AF+G DG+ P+T G 
20 Sbjct: 123 HLLKD-APCEVILLGGVYQKKSESMVGPLTRQCIQQVHFSKAFIGIDGWQPETGFTG 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3049> which encodes the amino acid 

sequence <SEQ ID 3050>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2888 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
30 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 131/171 (76%) , Positives = 150/171 (87%) 

35 Query: 1 MNRLENIISLVSQYQKIDVNTLSELLQVSKVTIRKDLDKLEGKGLLHREHGYAVLNSGDD 60 

MNRLE II LVSQ +KIDVN+LSE L VSKVTIRKDLDKLE KGLL REHGYAVLNSGDD 
Sbjct: 2 MNRLERIIQLVSQKKKIDWSLSEQLDVSKVTIRKDLDKLESKGLLRREHGYAVLNSGDD 61 

Query: 61 LNTOLSFNHKTKKEIAAIAANMVSDNDTILIESGSTCALIAENICQTKRNVTILITI 120 
40 LNVRLS+N+ K+ IA AA +V DNDTI+IESGSTCALLAE +CQTKRN+ ++TNSCFI 

Sbjct: 62 LNTOLSYNYNIKRRIAEKAAELVQDNDTIMIESGSTCALLAEVXCQTKRNIKVITNSCFI 121 

Query: 121 ANYLREYDSCQIVLLGGEYQSSSQVTVGPLLKKMISLFHVSLAFVGTDGFD 171 
ANY+R+Y SCQI+LLGG YQ +S+VTVGPLLK+MISLFHV+ FVGTDGF+ 
45 Sbjct: 122 ANYIRQYSSCQIILLGGYYQPNSEVTVGPLLKEMISLFHVNRVFVGTDGFN 172 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 995 

50 A DNA sequence (GBSxl055) was identified in S.agalactiae <SEQ ID 3051> which encodes the amino 
acid sequence <SEQ ID 3052>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0 . 1672 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG04879 GB:AE004578 probable transcriptional regulator 
[Pseudomonas aeruginosa] 
Identities = 20/70 (28%) , Positives = 40/70 (56%) 

Query: 6 GFMGRDLMRSEVAQEMANAADEVIILTDSSKFNQTALVEQLPLSTVSQVITDKHPNSEIA 65 

G M + +E+A+ M A ++ ++ DSSK + AL + PLS +++++ D+ P E+ 
Sbjct: 179 GAMDFSIEEAEIARAMIAQARQLTVIADSSKLGRRALFQVFPLSRINRLVVDRKPTGELW 238 

Query: 66 NLFQEAEITI 75 

Q+A + + 
Sbjct: 239 EALQQARVEV 248 

There is also homology to SEQ ID 3050. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 996 

A DNA sequence (GBSxl056) was identified in S.agalactiae <SEQ ID 3053> which encodes the amino 
acid sequence <SEQ ID 3054>. This protein is predicted to be transcriptional regulator. Analysis of this 
protein sequence reveals the following: 
Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 0904 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9541> which encodes amino acid sequence <SEQ ID 9542> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04499 GB:AP001509 transcriptional regulator [Bacillus halodurans] 
Identities = 98/309 (31%) , Positives = 178/309 (56%) , Gaps = 1/309 (0%) 



Query: 


6 


ERQKLLAKVAYLYYMEGKSQSEIANELGIYRTTISRMLAKAREEGLVRIEISDFNPEIFQ 


65 






E ++L+ KVA LYY EG +Q+++A ++G+ R IS++L KA+E+G+V I I D N + 




Sb j ct : 


5 


EERRLIVKVASLYYFEGWTQAQVAKKIGVSRPVISKLLNKAICEQGIVEIYIKDENIHTVE 


64 


Query: 


66 


LESYFKSKYHLKDIEIVSSRKDSDTSEIEKDIAHVAAAMIRKKIKENDKVGIAWGRTLSK 


125 






LE + KYHLK+ +V + I++ + + + K IK D +GI+WG T+S 




Sbjct: 


65 


LEQRLEKKYHLKEAIWPT-SGLTQDMIKRAIGKATSYYVSKNIKGMDSIGISWGTTVSS 


123 


Query: 


126 


WEAMRPHPVSQVSFVPLAGGPSHINARYHVNTLVYEMSRRFQGSCTFINATLVQENANL 


185 






V+ ++ +PL GG H N L YE++++ C+++ A + E L 




Sb j ct : 


124 


WQEYPYEQHRELKVIPLVGGMGRKFVELHSNLIAYEIAKKMNCECSYLYAPAMVEAKEL 


183 


Query: 


186 


AKGILTSKYFEGLMDNWEKLDVAIVGVGGKPKSNEQQWLDLLNQDDFQCLDEEAAVGEIT 


245 






+ ++ S+ +++ + +A+VG+G K + + ++ L ++D L + AVG+++ 




Sb j ct : 


184 


KERLIQSEDIASVLEEGRNVKMAWGIGSPFKGSTMKVMNYLKEEDIATLKKIGAVGDMS 


243 


Query: 


246 


CRFFNHSGDPTOQHIAKRTIGITLEQLQKVPNRIAVAHGNYKAAALIAVLKKGYINHLVT 


305 






RF++ G P++ L + IGI L++L+++P I V+ G +K ++ A LK GY++ LVT 




Sb j ct : 


244 


SRFYDALGQPIDHPIJSIELVIGIDLDELKRIPIVIGVSEGftHKVDSVEAALKGGYLDVLVT 


303 


Query: 


306 


DFSTALNIL 314 
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D STA +++ 
Sbjct: 304 DDSTAQSLI 312 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3055> which encodes the amino acid 
5 sequence <SEQ ID 3056>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2123 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 165/324 (50%), Positives = 238/324 (72%), Gaps = 1/324 (0%) 

Query: 3 MKLERQKLLAKVAYLYYMEGKSQSE I ANELGT YRTTI SRMLAKAREEGLVRI E I SDFNPE 62 

MK ER++LLAKVAYL+Y++GKSQ+ 1+ E+ IYRTT+ RMLAKA+EEG+VRIEI+D++ + 
Sbjct: 1 MKEERRRLIiAKVAYLHYVQGKSQTLISKEMNIYRTTVCRMLAKAKEEGIVRIEIADYDAD 60 



20 

Query: 63 IFQLESYFKSKYHLKDIEIVSSRKDSDTSEIEKDLAHVAAAMIRKKIKENDKVGIAWGRT 122 

+F LE Y + +Y L+ +++V ++ + + ++A AA + R +K+ DK+G++WG T 
Sbjct: 61 LFALEEYVRQQYGLEKLDLVPNQVEDTPMDTLTNVAKTAAEVFRHWKDGDKIGLSWGAT 120 

25 Query: 123 LSKVVEAMRPHPVSQVSFVPLAGGPSHINARYHVNTLVYEMSRRFQGSCTFINATLVQEN 182 

LS +++ + P + V PLAGGPSHINA+YHVNTLVY +-rR F G+ F+NA ++QE+ 
Sbjct: 121 LSCLMDELNPKAMKDVF I YPLAGGPSH I NAKYHVNTLVYRLARI FHGNSAFMNAMVI QED 180 

Query: 183 ANLAKGILTSKYFEGLMDNWEKLDVAIVGVGGKPKSNEQ-QWLDLLNQDDFQCLDEEAAV 241 
30 +LAKGIL SKYF ++ +W++LD+A+VG+GG+P S EQ QW DLL D L E AV 

Sbjct: 181 KHLAKGILQSKYFNDILTSWDQLDLALVGIGGEPNSLEQSQWRDLLTSSDHDQLKYEKAV 240 

Query: 242 GEITCRFFNHSGDPWQHLAKRTIGITLEQLQKVPNRIAVAHGNYKAAALLAVLKKGYIN 301 
GE+ CRFF+ +G PV L RTIGI+LEQL++VP +AVA G +KA A+LA LK G+IN 
35 Sbjct: 241 GEVCCRFFDQAGQPVYTGLQDRTIGISLEQLRRVPKTMAVATGKHKAKAILAALKAGFIN 300 

Query: 302 HLVTDFSTALNILRLDKDTFVDTI 325 

+LVTD T L +L LD+D ++ + 
Sbjct: 301 YLVTDKETMLAVLALDEDIDLNNV 324 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 997 

A DNA sequence (GBSxl057) was identified in S.agalactiae <SEQ ID 3057> which encodes the amino 
45 acid sequence <SEQ ID 3058>. This protein is predicted to be PTS enzyme III eel (celC). Analysis of this 
protein sequence reveals the following: 

Possible site: 55 

>>> Seems to have a cleavable N-term signal seq. 

50 Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

55 A related GBS nucleic acid sequence <SEQ ID 9543> which encodes amino acid sequence <SEQ ID 9544> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:AAA23551 GB:M93570 PTS enzyme III eel [Escherichia coli] 
Identities = 42/102 (41%) , Positives = 70/102 (68%) 

Query: 4 EIIVADQIIMGLILNAGDAKQHiyQALKLAKEGNFAESKIEIELADSALLEAHNLQTQFL 63 
5 E+ ++++MGLI+N+G A+ Y ALK AK+G+FA +K ++ + AL EAH +QT+ + 

Sbjct: 13 EVEELEEVWGLIINSGQARSI^YAALKQAKOGDFAAAKAMMDQSRMALNEAHLVQTKLI 72 

Query: 64 AQEAGGTRTDISALFIHSQDHLMTSITEINLIKEIIDLRQEL 105 
+AG + +S + +H+QDHLMTS+ LI E+I+L ++L 
10 Sbjct: 73 EGDAGEGKMKVSLVLVHAQDHLMTSMLARELITELIELHEKL 114 _ 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3059> which encodes the amino acid 
sequence <SEQ ID 3060>. Analysis of this protein sequence reveals the following: 

Possible site: 17 
15 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC74806 GB:AE000268 PEP-dependent phosphotransferase enzyme III 
for cellobiose, arbutin, and salicin [Escherichia coli] 
25 Identities = 39/97 (40%) , Positives = 66/97 (67%) 

Query: 7 DQIIMGLimAGDAKQHIYQALK(^KEDDYATSEKEMALADDALLEAHNLQTQFLAQEAS 66 

++++MGLI+N+G A+ Y ALK AK+ D+A ++ M + AL EAH +QT+ + +A 
Sbjct: 18 EEVVMGLIINSGQARSLAYAALKQAKQGDFAAAKAMMDQSRMALNEAHLVQTKLIEGDAG 77 

30 

Query: 67 GNKSEITALFVHSQDHLMTTITEINLIKEIIDLRKEL 103 
K +++ + VH+QDHLMT++ LI E+I+L ++L 
. Sbjct: 78 EGKMKVSLVLVHAQDHLMTSMLARELITELIELHEKL 114 

35 An alignment of the GAS and GBS proteins is shown below. 

Identities = 81/103 (78%) , Positives = 94/103 (90%) 

Query: 3 MEIIVADQIIMGLILNAGDAKQHIYQALKLAKEGNFAESKIEIELADSALLEAHNLQTQF 62 
M++IV DQI IMGLILNAGDAKQHIYQALK AKE ++A S+ E+ LAD ALLEAHNLQTQF 
40 Sbjct: 1 MQVIVPDQIIMGLILNAGDAKQHIYQALKCAKEDDYATSEKEMALADDALLEAHNLQTQF 60 

Query: 63 LAQEAGGTRTD I SALF IHSQDHLMTS I TE INLI KEI IDLRQEL 105 

LAQEA G +++I+ALF+HSQDHLMT+ITEINLIKEIIDLR+EL 
Sbjct: 61 LAQEASGNKSEITALFVHSQDHLMTTITEINLIKEIIDLRKEL 103 

45 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 998 

A DNA sequence (GBSxl058) was identified in S.agalactiae <SEQ ID 3061> which encodes the amino 
50 acid sequence <SEQ ID 3062>. This protein is predicted to be PTS system, cellobiose-specific IIB 
component (celA). Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have a cleavable N-term signal seq. 

55 Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF94440 GB:AE004207 PTS system, cellobiose-specif ic IIB 
component [Vibrio cholerae] 
Identities = 46/100 (46%) , Positives = 62/100 (62%) 

Query: 1 MIKIGLFCAAGFSTGMLVNNMKIAADKEGIEAHIEAYSQGKIADYAKDLDVALLGPQVSY 60 

M KI L C+AG ST MLV M+ AA+ +GIE I+A S + ++ DV LLGPQV + 

Sbjct: 1 MKKILLCCSAGMSTSMLWKMQOAAESKGIECKIDALSVNAFEEAIQEYDVCLLGPQVRF 60 

Query: 61 TLDKSKSICDEYGVPIAVIPMADYGMLDGVKVLKLALSLL 100 

L++ + DEYG IA I YGM+ G +VL+ AL L+ 
Sbjct: 61 QLEELRKTADEYGKNIAAISPQAYGMMKGDEVLQQALDLI 100 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3063> which encodes the amino acid 
sequence <SEQ ID 3064>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
»> Seems to have a cleavable N-term signal seq. 

20 Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the databases: 

>GP:AAF94440 GB:AE004207 PTS system, cellobiose-specif ic IIB 
component [Vibrio cholerae] 
Identities = 43/100 (43%) , Positives = 58/100 (58%) 

30 Query: 8 MIKIGLFt^GFSTGMLVNNMKVA^KKGIDCQIEAYAQGKLADYAPLLDVALLGPQVAY 67 

M KI L C+AG ST MLV M+ AAE KGI+C+I+A + + DV LLGPQV + 

Sbjct: 1 MKKILLCCSAGMSTSMLVKKMQQAAESKGIECKIDALSVNAFEEAIQEYDVCLLGPQVRF 60 

Query: 68 TLDKSEAI CKDND I P I AVI PMADYGMLDGNKVLDLALSLV 107 
35 L++ + IA I YGM+ G++VL AL L+ 

Sbjct: 61 QLEELRKTADEYGKNIAAISPQAYGMMKGDEVLQQALDLI 100 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 79/101 (78%) , Positives = 92/101 (90%) 

Query: 1 MIKIGLFCAAGFSTGMLVNNMKIAADKEGIEAHIEAYSQGKIADYAKDLDVALLGPQVSY 60 

MIKIGLFCAAGFSTGMLVNNMK+AA+K+GI+ IEAY+QGK+ADYA LDVALLGPQV+Y 
Sbjct: 8 MIKIGLFCAAGFSTGMLVNNMKVAAEKKGIDCQIEAYAQGKLADYAPLLDVALLGPQVAY 67 

45 Query: 61 TLDKSKS ICDEYGVPIAVI PMADYGMLDGVKVLKLALSLLE 101 

TLDKS++IC + +PIAVIPMADYGMLDG KVL LALSL++ 
Sbjct: 68 TLDKSEAI CKDNDIPIAVIPMADYGMLDGNKVLDLALSLVK 108 

SEQ ID 3062 (GBS 180) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
50 extract is shown in Figure 39 (lane 4; MW 12.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 41 (lane 2; MW 37.6kDa). 

The GBS180-GST fusion product was purified (Figure 204, lane 8) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 298), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



40 



WO 02/34771 



PCT/GB01/04789 



-1105- 

Example 999 

A DNA sequence (GBSxl059) was identified in S.agalactiae <SEQ ID 3065> which encodes the amino 
acid sequence <SEQ ID 3066>. This protein is predicted to be pts system, cellobiose-specific iic component 
(celB). Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 5670 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA17390 GB:U07818 cellobiose phosphotransferase enzyme II' ' 
[Bacillus stearothermophilus] 
Identities = 160/415 (38%) , Positives = 251/415 (59%) , Gaps = 13/415 (3%) 

Query: 15 KFVNMRGIIALKDGMLAILPLTWGSLFLILGQLPFKGLNQAIANVFGPEWTEPFMQVYS 74 

K R + A++DG++ +PL ++GSLFLI+G LP G N+ +A FG W + + 
Sbjct: 18 KIAEQRHLQAIRDGIILSMPLLIIGSLFLIVGFLPIPGYNEWMAKWFGEHWLDKLLYPVG 77 

Query: 75 GTFAIMGLISCFAIAYAYAKNSSVEPLPAGVLSLSSFFILMKSSYIPVKGEA IA 128 

TF IM L+ F +AY A+ V+ L AG +SL++F +L +P E ++ 

Sbjct: 78 ATFDIMALWSFGVAYRIAEKYKTOALSAGAISLftAF-LLATPYQVPFTPEGAKETIMVS 136 

Query: 129 DAISKVWFGGQGI IGAI I IGLWGAIYTWFIQHHIVIKMPEQVPQAIAKQFEAMIPAFVI 188 

I W G +G+ A+I+ +V IY IQ +IVIK+P+ VP A+A+ F A+IP + 
Sbjct: 137 GGIPVQWVGSKGLFVAMILAIVSTEIYRKIIQKNIVIKLPDGVPPAVARSFVALIPGAAV 196 

Query: 189 FLLSMIVYLIAKVTTGGTFIEMIYDIIQVPLQGLTGSLYGAIGIAFFISFLWWFGVHGQS 248 

++ + LI ++T +F ++ ++ PL h GS++GAI + LW G+HG + 

Sbjct: 197 LVVVWARLILEMTPFESFHNIVSVLLNKPLSVLGGSVFGAIVAVLLVQLLWSTGLHGAA 256 

Query: 249 WNGIVTALLLSNLDANKSLLAAN-RLTLDNGAHIVTQQFLDSFLILSGSGITFGLVIAM 307 

+V G++ + LS +D N+ + N L N ++TQQF D ++ + GSG T L + M 
Sbjct: 257 IVGGVMGPIWLSLMDENRMVFQQNPNAELPN- - -VITQQFFDLWIYIGGSGATLALALTM 313 

Query: 308 LFAAKSKQYKALGKVAAFPAIFITOI^PIVFGFPIVIWPVMFLPFILVPVIAALIVYGAIA 367 

+F A+S+Q K+LG++A P IFN+NEPI FG PIVMNP++ +PFILVPV+ ++ Y A+A 
Sbjct: 314 MFRARSRQLKSLGRLAIAPGIFNINEPITFGMPIVMNPLLIIPFILVPWLVWSYAAMA 373 

Query: 368 VGFMQPFSGVTLPWSTPAIISGFMVGGWQ--GALVQIVILAISTAVYFPFFKIQD 420 

G + SGV +PW+TP +ISG++ G + G+++QIV 1+ A+Y+PFF I D 
Sbjct: 374 TGLVAKPSGVAVPWTTPIVISGYLATGGKISGSILQIVNFFIAFAIYYPFFSIWD 428 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2215> which encodes the amino acid 
sequence <SEQ ID 2216>. Analysis of this protein sequence Teveals the following: 

Possible site: 40 

»> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -3.61 Transmembrane 140 
INTEGRAL Likelihood = -2.60 Transmembrane 229 
INTEGRAL Likelihood = -0.75 Transmembrane 72 



156 ( 134 - 160) 
245 ( 229 - 246) 
88 ( 72 - 88) 



Final Results 

bacterial membrane -• 

bacterial outside -■ 

bacterial cytoplasm -• 



- Certainty=0. 4 5 67 (Affirmative) < suco 

- Certainty=0.0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



10 An alignment of the GAS and GBS proteins is shown below. 

Identities = 366/428 (85%) , Positives = 402/428 (93%) , Gaps = 1/428 (0%) 

Query: 1 MSKFDSQKIITPIMKFVNMRGIIALKDGMLAILPLTWGSLFLILGQLPFKGLNQAIANV 60 
M+K + Q II PIM FVNMRGI IALKDGMLAILPLTWGSLFLI GQ+PF+G+N AIA+V 
15 Sbjct: 1 MAKMNMQNI I KPIMTFVNMRGI IALKDGMLAILPLTWGSLFLIAGQI PFQGVNDAIASV 60 

Query: 61 FGPEWTEPFMQVYSGTFAIMGLISCFAIAYAYAKNSSVEPLPAGVLSLSSFFILMKSSYI 120 

FG +WTEPFMQVY GTFAIMGLISCFAI Y+YAKNS VEPLP+GVLSLS+FFIL++SSY+ 
Sbjct: 61 FGADWTEPFMQVYHGTFAIMGLISCFAIGYSYAKNSGVEPLPSGVLSLSAFFILLRSSYV 120 

20 

Query: 121 PVKGEAIADAISKVWFGGQGIIGAIIIGLWGAIYTWFIQHHIVIKMPEQVPQAIAKQFE 180 

P +GEAI DAISKVWFGGQGI IGAI + IGL VGA+YT FI+ HI VIKMP+QVPQAIAKQFE 
Sbjct: 121 PAEGEAIGDAISKVWFGGQGIIGAIVIGLTVGAVYTTFIRRHIVIKMPDQVPQAIAKQFE 180 

25 Query: 181 AMIPAFVIFLLSMIVYLIAK-VTTGGTFIEMIYDIIQVPLQGLTGSLYGAIGIAFFISFL 239 

AMIPAFVIF LSM+VY+IAK VT GGTFIEMIYD+IQVPLQGLTGSLYGA+GIAFFISFL 
Sbjct: 181 AMIPAFVIFTLSMLVYIIAKSVTGGGTFIEMIYDVIQVPLQGLTGSLYGALGIAFFISFL 240 

Query: 240 WWFGvHGQSWNGIVTALLLSNLDANKSLLAANRLTLDNGAHIvTQQFLDSFLILSGSGI 299 
30 WWFGVHGQSWNGIVTALLLSNLDANK+L+AA L+LD GAHIVTQQFLDSFLILSGSGI 

Sbjct: 241 WWFGVHGQSVVNGIVTALLLSNLDANKALMAAGELSLDKGAHIVTQQFLDSFLILSGSGI 300 

Query: 300 TFGLVIAMLFAAKSKQYKALGKVAAPPAIFNVNEPIVFGFPIVITOPVMFLPFILVPVLAA 359 
TFGLV+AM+FAAKSKQYKALGKVAAFPA+FNVNEP+VFGFPI VMNPVMFLPFILVPVLAA 
35 Sbjct: 301 TFGLWAMIFAAKSKQYKALGCTAAFPALFNVNEPWFGFPIVMNPVMFLPFILVPVLAA 360 

Query: 360 LIVYGAIAVGFMQPFSGvTLPWSTPAIISGFMVGGWQGALVQIVILAISTAVYFPFFKIQ 419 

L VYGAIA+GFMQPF+GVTLPWSTPAI I SGFMVGGWQGA+VQI + IL +ST VYFPFFKIQ 
Sbjct: 361 LTVYGAIAIGFMQPFAGVTLPWSTPAIISGFMVGGWQGAIVQILILIMSTLVYFPFFKIQ 420 



40 



Query: 420 DNITYKNE 427 

DN+ Y+NE 
Sbjct: 421 DNMAYQNE 428 



45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1000 

A DNA sequence (GBSxl060) was identified in S.agalactiae <SEQ ID 3067> which encodes the amino 
acid sequence <SEQ ID 3068>. This protein is predicted to be formate acetyltransferase 2 (pflB). Analysis 
50 of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N- terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 5049 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

60 >GP:AAC73910 GB:AE000184 putative formate acetyltransferase 
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[Escherichia coli K12] 
Identities = 414/805 (51%) , Positives = 555/805 (68%) , Gaps = 14/805 (1%) 

Query: 25 LTERMYSYRDKVLD-KKPFIDAERAILVTEAYQKHQEKPNVLKRAYMLQNILEKMTIYID 83 
5 L++R+ ++++ ++ KP + ERA TE YQ+H +KP ++RA L + L TI+I 

Sbjct: 9 LSDRIKJffiKNALVHIVKPPVCTERA.QHYTEMYQQHLDKPIPVRRALALAHHLiANRTIWIK 68 

Query: 84 DETMIVGNQASSDKDAPIFPEYTLEFWNELDLFEKRDGDVFYITEETKEQIRNIAPFWE 143 
+ +I+GNQAS + APIFPEYT+ ++ E+D R G F ++EE K + + P+W 
10 Sbjct: 69 HDELIIGNQASEvRAAPIFPEYTVSWIEKEIDDLADRPGAGFAVSEENKRVLHEVCPWWR 128 

Query: 144 NNNLRARAGVMLPEEVQVYMETGFFGMEGKMNSGDM 203 

++ R M +E + + TG EG M SGDAHLAVN+ LLE+GL G ++ + 
Sbjct: 129 GQTVQDRCYGMFTDEQKGLLATGIIKAEGNMTSGDAHLAVNFPLLLEKGLDGLREEVAER 188 

15 

Query: 204 KADLDLTKPES I DKYHFYDS I LI T IEAVKTYAERFAI LAKKQAKTANAK- RRQELLDI AS 262 

++ ++LT E + F +1 I + AV + ERFA LA++ AT + RR ELL +A 
Sbjct: 189 RSRINLTVLEDLHGEQFLKAIDIVLVAVSEHIERFAALAREMAATETRESRRDELLAMAE 248 

20 Query: 263 ICERVPYYPAETFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKSDLEAGRETE 322 

C+ + + P +TF +A+Q +FIQ ILQIESNGHS+S+GR DQY+YPY + D+E + + 
Sbjct: 249 NCDLIAHQPPQTFWQALQLCYFIQLILQIESNGHSVSFGRMDQYLYPYYRRDVELNQTLD 308 

Query: 323 -DSIVERLTNLWIKTITINKVRSQAHTFSSAGSPLYQNVTIGGQTR HKEDAVNPLSF 378 

25 + +E L + W+K + +NK+RS +H+ +SAGSPLYQNVTIGGQ DAVNPLS+ 

Sbjct: 309 REHAIEMLHSCWLKLLEVNKIRSGSHSKASAGSPLYQNVTIGGQNLvDGQPMDAWPLSY 368 

Query: 379 L VLKS VAQTHLPQPNLTVRYHANLDKSFMNEAI E VMKLGFGMPAFNNDE III PS FI KKG V 438 
+L+S + QPNL+VRYHA + F++ ++V++ GFGMPAFNNDEI + IP FIK G+ 
30 Sbjct: 369 AILESCGRLRSTQPNLSVRYHAGMSNDFLDACVQVIRCGFGMPAFNNDEIVIPEFIKLGI 428 

Query: 439 SEEDAYDYSAIGCVETAVPGKWGYRCTGMSYINFPKVLIiITMNDGIDPASGKRFAP 494 

+DAYDY+AIGC+ETAV GKWGYRCTGMS+INF +V+L + G D SGK F P 
Sbjct: 429 EPQDAYDYAAIGCIETAVGGKWGYRCTGMSFINFARVMLAALEGGHDATSGKVFLPQEKA 488 

35 

Query: 495 -SYGHFTQMTSYKELKEAWDKTLRYLTRMSVIVENAIDISLEREVPDILCSALTDDCIGR 553 

S G+F ++ E+ +AWD +RY TR S+ +E +D LE V DILCSAL DDCI R 
Sbjct: 489 LSAGNFN NFDEVMDAWDTQIRYYTRKSIEIEYVVDTMLEENVHDILCSALVDDCIER 545 

40 Query: 554 GKHLKEGGAVYDYISGLQVGIANLSDSLAALKKLVFEEKRLTTLEVWQALQSDYAGPRGE 613 

K +K+GGA YD++SGLQVGIANL +SLAA+KKLVFE+ + ++ AL D+ G E 
Sbjct: 546 AKSIKQGGAKYDWVSGLQVGIANLGNSLAAVKKLVFEQGAIGQQQLAAALADDFDGLTHE 605 

Query: 614 EIRQMLINEAPKYGNDDDYADSLVRECYDVYVEEIAKYPNTRYGRGPIGGIRYSGTSSIS 673 
45 ++RQ LIN APKYGNDDD D+L+ Y Y++E+ +Y N RYGRGP+GG Y+GTSSIS 

Sbjct: 606 QLRQRLINGAPKYGNDDDTVDTLLARAYQTYIDELKQYHNPRYGRGPVGGNYYAGTSSIS 665 

Query: 674 ANVGQGRGTLATPDGRHAGTPLAEGCSPSHNMDKKGPTSVLKSVSKLPTDEIVGGVLLNQ 733 
ANV G T+ATPDGR A TPLAEG SP+ D GPT+V+ SV KLPT I+GGVLLNQ 
50 Sbjct: 666 ANVPFGAQTMATPDGRKAHTPLAEGASPASGTDHLGPTAVIGSVGKLPTAAILGGVLLNQ 725 

Query: 734 KOTPQTLAKEEDKQKLIALLRTFETniLHGYHIQYNVVSRETLIDAQKHPEKHRDLIVRVA 793 

K+NP TL E DKQKL+ LLRTFF G+HIQYN+VSRETL+DA+KHP+++RDL+VRVA 
Sbjct: 726 KLNPATLENESDKQKLMILLRTFFEVHKGWHIQYNIVSRETLLDAKKHPDQYRDLVVRVA 785 



55 



Query: 794 GYSAFFNVLSKATQDDIIARTEHAL 818 

GYSAFF LS QDDIIARTEH L 
Sbjct: 786 GYSAFFTALSPDAQDDI IARTEHML 810 

60 A related DNA sequence was identified in S.pyogenes <SEQ ID 3069> which encodes the amino acid 
sequence <SEQ ID 3070>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

65 Final Results 

bacterial cytoplasm --- Certainty=0. 4763 (Affirmative) < suco 
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bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 694/803 (86%) , Positives = 747/803 (92%) 

Query: 16 QNSQKHFGYLTERMYSYRDKVLDKKPFIDAERAILVTEAYQKHQEKPNVLKRAYMLQNIL 75 

+ +FG+LT+RM YR+ VLDKKP+IDAERML TEAYQKHQ KP LKRAYMLQ IL 
Sbjct: 3 ETKSPYFGHLTDRMTHYREAVLDKKPYIDAERA.ILATEAYQKHQNKPANLKRAYMLQTIL 62 

Query: 76 EKMTIYIDDETMIVGNQASSDKDAPIFPEYTLEFWNELDLFEKRDGDVFYITEETKEQI 135 

E MTIYI+DE++I GNQASS+KDAPIFPEYTLEFV+NELDLFEKRDGDVFYITEETK+Q+ 
Sbjct: 63 ENMTIYIEDESLIAGNQASSNKDAPIFPEYTLEFVIjNELDLFEKRDGDVFYITEETKQQL 122 

15 Query: 136 RNIAPFWENNNLRARAGVMLPEEVC^ 195 

R+IAPFWENNNLRAR gv+lpeevqvymetgffgmegkmnsgdahlavnyqklle gl g 

Sbjct: 123 RDIAPFWENNNLRARCGVLLPEEVQVYMETGFFGMEGKMNSGDAHLAVNYQKLLEHGLKG 182 

Query: 196 FEKKARKAKADLDLTKPES I DKYHF YDS I L ITI EAVKTYAERFAI LAKKQAKTANAKRRQ 255 
20 FE++AR AKA LDLT PE+IDKYHFYDS+ I I+AVKTYA+R+A LA++ AKTA +R+ 

Sbjct: 183 FEERARAAKAALDLTIPENIDKYHFYDSVFIVIDAVKTYAKRYAKLARELAKTAKPERQA 242 

Query: 256 ELLDIASICERVPYYPAETFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKSDL 315 
ELLDIA IC++VPY PA+TFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVK+DL 
25 Sbjct: 243 ELLDIARICDKVPYEPAKTFAEAVQSVWFIQCILQIESNGHSLSYGRFDQYMYPYVKADL 302 

Query: 316 EAGRETEDS I VERLTNLWI KT IT INKVRSQAHTFSSAGS PLYQNVTIGGQTRHKEDAVNP 375 

EAGRETED+IVERLTNLWIKT+TINKVRSQAHTFSSAGSPLYQNVTIGGQTR K+DAVNP 
Sbjct: 303 EAGRETEDTI VERLTMLWIKTLTINKVRSQAHTFSSAGSPLYQNVTIGGQTRDKKDAVNP 362 

30 

Query: 376 LSFLVLKSVAQTHLPQPNLTWYHANLDKSFMNEAIEVMKLGFGMPAFNNDEIIIPSFIK 435 

LS+LVL+SVAQT LPQPNLTVRYH LD +FMNE IEVMKLGFGMPA NNDEIIIPSFIK 
Sbjct: 363 LSYLVLRSVAQTKLPQPNLTVRYHKGLDOTFMNECIEVMKLGFGMPAMNNDEIIIPSFIK 422 

35 Query: 436 KGVSEEDAYDYSAIGCTETAVPGKWGYRCTGMSYINFPKVLLITMNDGIDPASGKRFAPS 495 

KGVSEEDAYDYSAIGCVETAVPGKWGYRCTGMSYINFPK+LLITMNDGIDPASGKRFA 
Sbjct: 423 KGVSEEDAYDYSAIGCVETAVPGKWGYRCTGMSYINFPKILLITMNDGIDPASGKRFAKG 482 

Query: 496 YGHFTQMTSYKELKEAWDKTLRYLTRMSVIVENAIDISLEREVPDILCSALTDDCIGRGK 555 
40 +GHF MTSY+ELK AWD TLR +TRMSVIVENAID+ LEREVPDILCSALTDDCIGRGK 

Sbjct: 483 HGHFKDMTSYEELKA&WDATLREITRMSVIVENAIDLGLEREVPDILCSALTDDCIGRGK 542 

Query: 556 HLKEGGAVYDYISGLQVGIANLSDSIAALKKLVFEEKRLTTLEVWQALQSDYAGPRGEEI 615 
LKEGGAVYDYI SGLQVGIANLSDSLAALKKLVFEE RLT E+W+AL+SD+AG RGE+I 
45 Sbjct: 543 TLKEGGAVYDYISGLQVGIANLSDSLAALKKLVFEEGRLTPEELWKALESDFAGERGEDI 602 

Query: 616 RQMLINEAPKYGNDDDYADSLVRECYDVYVEEIAKYPNTRYGRGPIGGIRYSGTSSISAN 675 

RQMLIN+APKYGNDDDYADSLV E YD Y++EIAKYPNTRYGRGPIGGIRYSGTSSISAN 
Sbjct: 603 RQMLINDAPKYGNDDDYADSLWEAYDTYIDEIAKYPNTRYGRGPIGGIRYSGTSSISAN 662 

50 

Query: 676 VGQGRGTIATPDGRHAGTPLAEGCSPSH1MDKKGPTSVLKSVSKLPTDEIVGGVLLNQKV 735 

VGCG+GTLATPDGRHAGTPLAEGCS P H+MDKKGPTSVLKSV+KLPTDEIVGGVLLNQKV 
Sbjct: 663 VGQGKGTLATPDGRHAGTPLAEGCS PEHSMDKKGPTSVLKSVAKLPTDE I VGGVLLNQKV 722 

55 Query: 736 NPQTIAKEEDKQKLIALLRTFFNRLHGYHIQYNWSRETLIDAQKHPEKHRDLIVRVAGY 795 

NPQTLAKEEDK KL+ALLRTFFNRLHGYHIQYNWSRETLIDAQKHPEKHRDLIVRVAGY 
Sbjct: 723 NPQTLAKEEDKLKLl^VLLRTFFNRLHGYHlQYNWSRETLIDAQKHPEKHRDLlvRVAGY 782 

Query: 796 SAFFNVLSKATQDDI IARTEHAL 818 
60 SAFFNVLSKATQDDI I RTEH L 

Sbjct: 783 SAFFNVLSKATQDDI IERTEHTL 805 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1001 

A DNA sequence (GBSxl061) was identified in S.agalactiae <SEQ ID 3071> which encodes the amino 
acid sequence <SEQ ID 3072>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1024 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA05516 GB:AJ002527 OrfX [Clostridium bei jerinckii] 
Identities = 90/214 (42%), Positives = 131/214 (61%), Gaps = 1/214 (0%) 

15 

Query: 1 MEFLLDTLNLEAIKKWHHILPLAGVTSNPTIAKKEGDIHFFQRIRDVREIIGREASLHVQ 60 

M+ ++D +N+E IK I + GVTSNP+I KG + 1+ +RE IG + LHVQ 
Sbjct: 1 MKLIIDDVNIEKIKDVFSIFQIDGVTSNPSILHKYGKQPYEILIK-IREFIGENSELHVQ 59 

20 Query: 61 WAKDYQGILDDAAKIRQETDDDIYIKVPVTPDGLAAIKTLKAEGYNITATAIYTSMQGL 120 

V+++ +G+L +A KI +E + Y+K+PVT DGL AIK L+ E N+TATAIYT MQ 
Sbjct: 60 VI SESSEGMLKEAHKI I KELGKNTYVKIPVTRDGLKAI KILRKEEINVTATAIYTQMQAY 119 

Query: 121 LAI SAGADYLAPYFNRMENLD IDATQVI KELAQAIERTGS SSKI LAASFKNASQVTKALS 180 
25 LA AGA Y APY NR++NL + QV K++ E+ +++LAASFKN+ QV + 

Sbjct: 120 LAGKAGAQYAAPYVNRIDNLGANGVQVAKDIHDIFEKNNFKTEVLAASFKNSQQVLELCK 179 

Query: 181 QGAQSITAGPDIFESVFAMPSIAKAVNDFADDWK 214 
G + T PD+ E + + AV +F D++ 

30 Sbjct: 180 YGIGAATISPDVIEGLIKNDCVDVAVENFKKDFE 213 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3073> which encodes the amino acid 
sequence <SEQ ID 3074>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1090 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/222 (71%) , Positives = 194/222 (87%) 

45 Query: 1 MEFLLDTLNLFAIKKWHHILPLAGVTSNPTIAKKEGDIHFFQRIRDVREIIGREASLHVQ 60 

ME++LDTL+LEAIKKWHHILPLAGVTSNP+IAKKEG+I FF+RIR+VR IIG +AS+HVQ 
Sbjct: 1 MEYMLDTLDLEAIKKWHHILPLAGVTSNPSIAKKEGEIDFFERIREVRAIIGDKASIHVQ 60 

Query: 61 WAKDYQGILDDAAKIRQETDDDIYIKVPVTPDGLAAIKTLKAEGYNITATAIYTSMQGL 120 
50 V+A+DY+GIL DAA+IR++ D +Y+KVPVT +GLAAIKTLKAEGY+ITATAIYT+ QGL 

Sbjct: 61 VIAQDYEGILKDAAEIRRQCGDSVYVKVPvTTEGLAAIKTLKAEGYHITATAIYTTFQGL 120 

Query: 121 LAI SAGADYIAPYFNRMFjNLDIDATQVTKELAQAIERTGSSSKI LAASFKNASQVTKALS 180 
LAI AGADYLAPY+NRMENL+ ID VI++LA+AI R ++SKILAASFKN +QV K+ + 
55 Sbjct: 121 LAIFAGADYLAPYYNRMENLNIDPEAVIEQIiAEAINRENANSKILARSFKNTO 180 

Query: 181 QGAQS ITAGPDI FESVFAMPS I AKAVNDFADDWKASQHSEHI 222 

GAQ+ITAGPD+FE+ FAMPSI KAV+DF DW+A H + I 
Sbjct: 181 LGAQAITAGPDVFEAGFAMPSIQKAVDDFGKDWEAIHHRKSI 222 

60 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1002 

A DNA sequence (GBSxl062) was identified in S.agalactiae <SEQ ID 3075> which encodes the amino 
5 acid sequence <SEQ ID 3076>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 3086 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9545> which encodes amino acid sequence <SEQ ID 9546> 
15 was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA22477 GB:M65289 glycerol dehydrogenase [Bacillus 
s tearothermophi lus ] 
Identities = 199/362 (54%) , Positives = 271/362 (73%) , Gaps = 2/362 (0%) 

20 

Query: 4 KOTASPSRYIQGKDALFQSIEHIKSLGQTPLILCDDVVYNIVGERFLSYLQD-DLLPHRV 62 

+VF SP++Y+QGK+ + + +++ +G +++ D++V+ I G ++ L+ ++ V 
Sbjct: 5 RVFISPAKYVQGKOTITKIANYLEGlGNKaOTIADEIVWKIAGHTIVNELKKGNIAAEEV 64 

25 Query: 63 SFNGEASDNEINRWAVAKEKNSDLIIGLGGGKTIDSAKAIADKVNLPWIAPTVASTDA 122 

F+GEAS NE+ R+ +A++ + ++IG+GGGKT+D+AKA+AD+++ +VI PT ASTDA 
Sbjct: 65 VFSGEASRNEVERIANIARKAEARIVIGVGGGKTLDTAKAVADELDAYIVIVPTAASTDA 124 

Query: 123 PTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVIAQAPKRLIASGIADGLATWVEARAV 182 
30 PTSALSVIY+D+G FE Y FY KNPDL VLVDT+ + IA AP RLLASGIAD LATWVEAR+V 

Sbjct: 125 PTSALSVIYSDDGVFESYRFYKKNPDLVLVDTKIIANAPPRLLASGIADALATWVEARSV 184 

Query: 183 LQKNGlAMAGGRQTLAGVAIAQACERTLFNDSLQAIAACDAKATVTKALENVIEflNTLLSG 242 
++ G MAGG T+A AIA+ CE+TLF A + AKWT ALE V+EANTLLSG 

35 Sbjct: 185 IKSGGKT^GGIPTIAAEAIAEKCEQTLFKYGKIAYESVKAKWTPALEAVVEANTLLSG 244 

Query: 243 LGFESAGLAAAHAIHNGFTALSGDIHHLTHGEKVAYGTLTQLFLENRPKEEIDRYINLYQ 302 

LGFES GLAAAHAIHNGFTAL G+IHHLTHGEKVA+GTL QL LE ++EI+RYI LY 
Sbjct: 245 LGFESGGLAAAHAIHNGFTALEGEIHHLTHGEKVAFGTLVQLALEEHSQQEIERYIELYL 304 

40 

Query: 303 AIGMPTTLAELHLGDATYEELLKVGQQATIEGETIHEMPFKISAEDVAAALLTVDRYVSN 362 

++ +P TL ++ L DA+ E++LKV + AT EGETIH F ++A+DVA A+ D+Y 
Sbjct: 305 SLDLP VTLEDI KLKDASRED I LKVAKAATAEGETI HN- AFNVTADDVADAI FAADQYAKA 363 

45 Query: 363 HQ 364 

++ 

Sbjct: 364 YK 365 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3077> which encodes the amino acid 
50 sequence <SEQ ID 3078>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.62 Transmembrane 101 - 117 ( 98 - 119) 



55 



Certainty=0. 2848 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:AAA22477 GB:M65289 glycerol dehydrogenase [Bacillus 
stearothermophilus] 
Identities = 202/357 (56%), Positives = 261/357 (72%), Gaps = 1/357 .(0%) 

Query: 2 KVFASPSRYIQGKNALFTNVKTLKQLGDSPILLCDDWYGIVGERFESYLIDNGMTPVHV 61 

+VF SP++Y+QGKN + L+ +G+ +++ D++V+ I G + L + V 

Sbjct: 5 RVFISPAKYVQGKNVITKIANYLEGIGNKTWIM)EITOKIAGHTIVNELKKGNIAAEEV 64 

Query: 62 AFNGEASDNEISRWAIAKENGNDVIIGLGGGKTIDSAKAIADLIAVPVIIAPTIASTDA 121 

F+GEAS NE+ R+ IA++ ++IG+GGGKT+D+AKA+AD L ++I PT ASTDA 
Sbjct: 65 VFSGEASRNEVERIANIARKAEAAIVIGVGGGICrLDTAKAVADEIjDAYIVIVPTAASTDA 124 

Query: 122 PTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVICQAPKRLLASGIADGLATWVEARAV 181 

PTSALSVIY+D+G FE Y FY KNPDLVLVDT++I AP RLLASGIAD LATWVEAR+V 
Sbjct: 125 PTSALSVIYSDDGVFESYRFYKKNPDLVLVDTKIIANAPPRLLASGIADALATWVEARSV 184 

Query: 182 MQKNGDTMAGGNQT1AGVAIAKACEQTLFADGLKAMASCDRQVVTPALENVIEANTLLSG 241 

++ G TMAGG T+A AIA+ CEQTLF GAS +WTPALE V+EANTLLSG 
Sbjct: 185 IKSGGKT^GGIPTIAAEAIAEKCEQTLFKYGKLAYESVlCAKOTrPALEAVVEANTLLSG 244 

Query: 242 LGFESAGLAAAHAIHNGFTALTGAIHHLTHGEKVAYGTLTQLFLENRSREEIDRYIDFYQ 301 

LGFES GLAAAHAI HNGFTAL G IHHLTHGEKVA+GTL QL LE S++EI+RYI+ Y 
Sbjct: 245 LGFESGGLAAAHAIHNGFTALEGEIHHLTHGEKVAFGTLVQLALEEHSQQEIERYIELYL 304 

Query: 302 AIGMPTTLKEMHLDTATQEDFLKIGRQATMAGETIHQMPFVISPEDVAAALVAVDAY 358 

++ +P TL+++ L A++ED LK+ + AT GETIH F ++ +DVA A+ A D Y 
Sbjct: 305 SLDLPVTLEDIKLKDASREDILKVAKAATAEGETIHN-AFNVTADDVADAIFAADQY 360 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 287/361 (79%) , Positives = 325/361 (89%) , Gaps = 1/361 (0%) 

Query: 3 MKVFASPSRYIQGKDALFQSIEHIKSLGQTPLILCDDWYNIVGERFLSYLQDD-LLPHR 61 

MKVFASPSRYIQGK+ALF +++ +K LG +P++LCDDWY IVGERF SYL D+ + P 
Sbjct: 1 MKVFASPSRYIQGKNALFTNVKTLKQLGDSPILLCDDWYGIVGERFESYLIDNGMTPVH 60 

Query: 62 VSFNGFASDNEINRWAVAKEKNSDLIIGLGGGKTIDSAKAIADKAOTLPWIAPTVASTD 121 

V+ FNGEASDNE I +RWA+AKE +D+IIGLGGGKTIDSAKAIAD + +PV+IAPT+ASTD 
Sbjct: 61 VAFNGEASDNEI SRWAIAKENGNDVI IGLGGGKTIDSAKAIADLLAVPVI IAPTIASTD 120 

Query: 122 APTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVIAQAPKRLLASGIADGLATWVEARA 181 

APTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVI QAPKRLLASGIADGLATWVEARA 
Sbjct: 121 APTSALSVIYTDEGAFEKYIFYSKNPDLVLVDTQVICQAPKRLLASGIADGLATWVEARA 180 

Query: 182 VLQKNGIAMAGGRQTIiAGVAIAQACERTLFMJSLQALAACDAKVVTKALEOTIFJUSITLLS 241 
V+QKNG MAGG QTLAGVAIA+ACE+TLF D L+A+A+CD +WT ALENVIEANTLLS 
' Sbjct: 181 VMQKNGDT^GGNQTIAGVAlAfCACEQTLFADGLKAMASCDRQVVTPALENVIEANTLLS 240 

Query: 242 GLGFESAGLAAAHAIHNGFTALSGDIHHLTHGEKVAYGTLTQLFLENRPKEEIDRYINLY 301 

GLGFESAGLAAAHAIHNGFTAL+G IHHLTHGEKVAYGTLTQLFLENR +EEIDRYI+ Y 
Sbjct: 241 GLGFESAGIAAAHAIHNGFTALTGAIHHLTHGEKVAYGTLTQLFLENRSREEIDRYIDFY 300 

Query: 302 QAIGMPTTLAELHLGDATYEELLKVGQQATIEGETIHEMPFKISAEDVAAALLTVDRYVSN 362 

QAIGMPTTL E+HL AT E+ LK+G+QAT+ GETIH+MPF IS EDVAAAL+ VD YV++ 
Sbjct: 301 QAIGMPTTLKEMHLDTATQEDFLKIGRQATMAGETIHQMPFVISPEDVAAALVAVDAYVTS 361 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1003 

A DNA sequence (GBSxl063) was identified in S.agalactiae <SEQ ID 3079> which encodes the amino 
acid sequence <SEQ ID 3080>. Analysis of this protein sequence reveals the following: 

Possible site: 28 
5 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 262 - 278 ( 262 - 279) 

Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA88310 GB:AB028865 O-acetylserine lyase [Streptococcus suis] 
15 Identities = 239/304 (78%) , Positives = 273/304 (89%) 

Query: 4 IYNSITDLIGNTPIIQLHHIVPEGAAEVYVKLESFNPGSSVKDRIALAMIEDAEQKGILK 63 

IY +IT L+G TP+ 1 +L++ IVPEGAAE VYVKLE+FNPGSSVKDRIALAMIEDAE+ G +K 
Sbjct: 3 IYQNITQLVGKTPVIKIjNNIVPEGAAEVYVKLEAFNPGSSVKDRIALAMIEDAEKAGTIK 62 

20 

Query: 64 AGDTIVEPTSGNTGIGLAWGKAKGYNVIIVMPETMSIERRKIIQAYGAQLVLTPGSEGM 123 

GDTIVEPTSGNTGIGLAWVG AKGYNVIIVMPETMS+ERRKIIQAYGA+LVLTPGSEGM 
Sbjct: 63 PGDTIVEPTSGNTGIGLAWVGAAKGYNVIIVMPETMSVERRKIIQAYGAELVLTPGSEGM 122 

25 Query: 124 KGAIAKAKEISAEQNAWLPLQFNNQANPEIHEKTTGREIIETFGEKGLDAFIAGVGTGGT 183 

KGAIAKAKE I + E+N W+P QF N +NP++HE TTG+EI+E FG GLDAF++GVGTGGT 
Sbjct: 123 KGAIAKAKE IAEEKNGWVPFQFANPSNPKVHEDTTGQEILEDFGTTGLDAFVSGVGTGGT 182 

Query: 184 ITGVSRALKKVNPDVAIYAVEADESAILSGEQPGPHKIQGISAGFIPETLATDSYDHIIR 243 
30 ++GVS LK NPD+AIYAVEADESA+LSGE PGPHKIQGISAGFI P+TL T +YD IIR 

Sbjct: 183 VSGVSHVLKTANPDIAIYAVEADESAVLSGEAPGPHKIQGISAGFIPDTLDTSAYDGIIR 242 

Query: 244 VTSDDAIETGRIIGGLEGFLAGISASAAIYAAIEVAKQLGKGKKVLALLPDNGERYLSTS 303 
V SDDA+ TGR IGG EGFL GIS+ AAI +AAIE VAK+LG GKKVLA+LPDNGERYLST+ 
35 Sbjct: 243 VKSDDALATGRAIGGKEGFLVGISSGAAIHAAIEVAKELGTGKKVLAILPDNGERYLSTA 302 

Query: 304 LYDF 307 
LY+F 

Sbjct: 303 LYEF 306 

40 

A related DNA sequence was identified in S.pyogenes <SEQ ID 308 1> which encodes the amino acid 
sequence <SEQ ID 3082>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
>» Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -0.48 Transmembrane 262 - 278 ( 262 - 278) 

Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAA88310 GB:AB028865 O-acetylserine lyase [Streptococcus suis] 
Identities = 235/303 (77%) , Positives = 261/303 (85%) 

55 

Query: 4 IYKTITELVGQTPIIKIiNRLIPNEAADvYvKLEAFNPGSSVKDRIALSMIEAAEAEGLIS 63 

IY+ IT+LVG+TP+IKLN ++P AA+VYVKLEAFNPGSSVKDRIAL+MIE AE G I 
Sbjct: 3 IYQNITQLVGKTPVIKLNNIVPEGAAEVYVKLEAFNPGSSVKDRIALAMIEDAEKAGTIK 62 



60 Query: 64 PGDVI IEPTSGNTGIGLAWVGAAKGYRVI I VMPETMSLERRQI IQAYGAELVLTPGAEGM 123 

PGD I+EPTSGNTGIGLAWVGAAKGY VIIVMPETMS+ERR+IIQAYGAELVLTPG+EGM 
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Sb j Ct : 


63 


PGDTIVEPTSGNTGIGLAWVGAAKGYNVIIVMPETMSVERRKIIQAYGMLVLTPGSEGM 


122 


Query: 


124 


KGAIAKAETLAIELGAWMPMQFNNPANPSIHEKTTAQEILEAFKEISLDAFVSGVGTGGT 


183 










Sb j ct : 


123 


KGAIAKAKEIAEEKNGWVPFQFANPSNPKVHEDTTGQEILEDFGTTGLDAFVSGVGTGGT 


182 


Query: 


184 


LSGVSHVLKKANPE WIYAVEAEESAVLSGQEPGPHKIQGI SAGFI PNTLDTKAYDQI IR 


243 










Sb j ct : 


183 


VSGVSHVLKTANPDIAIYAVEADESAVLSGEAPGPHKIQGISAGFIPDTLDTSAYDGIIR 


242 


Query: 


244 


VKSKDALETARLTGAKEGFLVGISSGAALYAAIEVAKQLGKGKHVLTILPDNGERYLSTE 


303 






VKS DAL T R G KEGFLVGISSGAA++AAIEVAK+LG GK VL ILPDNGERYLST 




Sb j ct : 


243 


VKSDDAIATGRA.IGGKEGFLVGISSGAAIHAAIEVAKELGTGKKVLAILPDNGERYLSTA 


302 


Query: 


304 


LYD 306 








LY+ 




Sb j ct : 


303 


LYE 305 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 222/306 (72%) , Positives = 263/306 (85%) 



Query: 


1 


MSKIYNSITDLIGNTPIIQLHHIVPEGAAEVYVKLESFNPGSSVKDRIAIAMIEDAEQKG 


60 






M+KIY +IT+L+G TPII+L+ ++P AA+VYVKLE+FNPGSSVKDRIAL+MIE AE +G 




Sb j ct : 


1 


MTKIYKTITELVGQTPIIKLNRLIPNEAADVYVKLEAFNPGSSVKDRIALSMIEAAEAEG 


60 


Query: 


61 


ILKAGDTI VEPTSGNTGIGLAWVGKAKGYNVIIVMPETMSIERRKIIQAYGAQLVLTPGS 


120 






++ GD I'+EPTSGNTGIGLAWVG AKGY VI I VMPETMS+ERR+ 1 IQAYGA+L VLTPG+ 




Sb j Ct : 


61 


LISPGDVIIEPTSGNTGIGIAWVGARKGYRVIIVMPETMSLERRQIIQAYGAELVLTPGA 


120 


Query: 


121 


EGMKGAIAKRKEISAEQNAWLPLQFNNQANPEIHEKTTGREIIETFGEKGLDAFIAGVGT 


180 






EGMKGAIAKA+ ++ E AW+P+QFNN ANP IHEKTT +EI+E F E LDAF++GVGT 




Sbjct: 


121 


EGMKGAIAKAETIAIELGAWMPMQFNNPANPSIHEKTTAQEILEAFKEISIiDAFVSGVGT 


180 


Query: 


181 


GGTITGVSRALKKVNPDVAIYAVEADESAILSGEQPGPHKIQGISAGFIPETLATDSYDH 


240 






GGT++GVS LKK NP+ IYAVEA+ESA+LSG++PGPHKIQGISAGFIP TL T +YD 




Sb j ct : 


181 


GGTLSGVSHVLKKANPETVIYAVEAEESAVLSGQEPGPHKIQGISAGFIPNTLDTKAYDQ 


240 


Query: 


241 


IIRVTSDDAIETGRIIGGLEGFLAGISASAAIYAAJEVAKQLGKGKKVLALLPDNGERYL 


300 






IIRV S DA+ET R+ G EGFL GIS+ AA+YAAIEVAKQLGKGK VL +LPDNGERYL 




Sb j Ct : 


241 


IIRVKSKDALETARLTGAKEGFLVGISSGAALYAAIEVAKQLGKGKHVLTILPDNGERYL 


300 


Query: 


301 


STSLYD 306 








ST LYD 




Sb j ct : 


301 


STELYD 306 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1004 

A DNA sequence (GBSxl064) was identified in S.agalactiae <SEQ ID 3083> which encodes the amino 
acid sequence <SEQ ID 3084>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3666 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07349 GB:AP001519 unknown conserved protein [Bacillus halodurans] 
Identities = 96/204 (47%) , Positives = 127/204 (62%) 
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Query: 2 NYKTIKSDGIVEEEIKJ<SRFICHLKRVESEEEGROTITQIKKAHYKANHSCSAMVIGEKG 61 

+Y T+K GI E I+KSRFI HL R SEEE +1 QIKK H+ A H+CSA +IGE 
Sbjct: 4 SYYTVKESGIHEISIQKSRFIAHLSRATSEEEAIQFIEQIKKEHWNATHNCSAYLIGEND 63 

5 

Query: 62 D I KRSSDDGEPSGTAGI PMLTVLEKQGLTNVVAVVTRYFGGI KLGAGGL I RAYSGSVANT 121 

+++++DDGEPSGTAG+PML VL+K+ L + VAWTRYFGG+KLGAGGLIRAY +V++ 
Sbjct: 64 QVQKANDDGEPSGTAGVPMLE^KKmLKDTmWTRYFGGVKLGAGGLIRAYGSAVSDG 123 

10 Query: 122 IKEIGvvEVKEQIGIRIQLTYPQYQTFDNFLKEHHLQEFETEFLEAVTCKIYVDPKEFEH 181 

+ IGWE K I + Y +N L++ H E +LE V + YV EE 

Sbjct: 124 LNAIGVVERKRMQVIHTSIDYHWLGK^NELRQSHYLLKEISYLEISrVDVQTYVLEAEVES 183 

Query: 182 TITNLTEFYQGKALLTEEGSQIVE 205 
15 +T G+A T + +E 

Sbjct: 184 YCEWMTNLTNGQAAFTHGAIEYLE 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3085> which encodes the amino acid 
sequence <SEQ ID 3086>. Analysis of this protein sequence reveals the following: 

20 Possible site: 36 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 86 - 102 ( 86 - 102) 

Final Results 

25 bacterial membrane Certainty=0. 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9153> which encodes the amino acid sequence 

30 <SEQ ID 9154>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 81 - 97 ( 81 - 97) 

35 Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 122/206 (59%) , Positives = 153/206 (74%) 

Query: 2 NYKTI KSDGI VEEE I KKSRFI CHLKR VESEEEGRNYITQI KKAHYKANHS CSAMVIGEKG 61 
++KTIK+ G EE IKKSRFICH+KRV +EE+G+N++ IKK HYKANHSC AM+IG 
45 Sbjct: 8 HFKTIKASGFFEESIKKSRFICHIKRVSTEEDGKNFWAIKKEHYKRNHSCFAMIIGNNR 67 

Query: 62 DIKRSSDDGEPSGTAGIPMLTVLEKQGLTNVVAVVTRYFGGIKLGAGGLIRAYSGSVANT 121 

IKRSSDDGEPSGTAGIP+L+VLEKQ LTNVV WTRYFGGIKLG GGLIRAYS A 
Sbjct: 68 QIKRSSDDGEPSGTAGIPILSVLEKQCLTNWVWTRYFGGIKLGTGGLIRAYSNMTATA 127 



50 



Query: 122 IKEIGWEVKEQIGIRIQLTYPQYQTFDNFLKEHHLQEFETEFLEAVTCKIYVDPKEFEH 181 

IK G++EVK+QIG+ I L+YPQYQ + N L + L E ET+F + + +Y D + E+ 
Sbjct: 128 IKRFGIIEVKQQIGLEITLSYPQYQLYSNLLDQLALTETETKFSDTIKTTLYCDTERVEN 187 



55 Query: 182 TITNLTEFYQGKALLTEEGSQIVEIP 207 

I LT +Y G+ + GS+++E P 
Sbjct: 188 LIDTLTNYYHGQISCEKIGSKVIEFP 213 



60 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1005 

A DNA sequence (GBSxl065) was identified in S.agalactiae <SEQ ID 3087> which encodes the amino 
acid sequence <SEQ ID 3088>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1421 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44940 GB:U56901 involved in transformation [Bacillus subtilis] 
Identities = 160/405 (39%) , Positives = 228/405 (55%) , Gaps = 20/405 (4%) 

Query: 35 YICTRCSSSVAKNCQL PTGNYYCRECIVFGRVTSNENLYYFPQKTFSKTNSLK--W 88 

Y C RC + + YCR C++ GRV+ LY + ++ S S+K W 

Sbjct: 58 YRCNRCGQTDQRYFSFYHS SGKNKLYCRS CVMMGRVSEEVPLYSWKEENESNWKS I KLTW 117 

Query: 89 KGELTPYQNEVSEELLKGISSKENLLVHAVTGAGKTEMIYHSVAKVIDTGGSVCIASPRI 148 

G+L+ Q + + L++ IS KE LL+ AV GAGKTEM++ + ++ G VCIA+PR 
Sbjct: 118 DGKLSSGQQKAANVLIEAISKKEELLIWAVCGAGKTEMLFPGIESALNQGLRVCIATPRT 177 

Query: 149 DVCLELYKRLSNDFRCA-ITLMHGESPSYQR-SPLTIATTHQLLKFYHAFDLLIVDEVDA 206 

DV LEL RL F+ A 1+ ++G S R SPL I+TTHQLL++ A D++I+DEVDA 
Sbjct: 178 DWLELAPRLKAAFQGADISALYGGSDDKGRLSPLMISTTHQLLRYKDAIDvMIIDEVDA 237 

Query: 207 FPYVDNPILYQGVKQALKENGTSIFLTATSTTELERKVARKELKKLHLARRFHANPLVIP 266 

FPY + L V++A K+N T ++L+AT EL+RK +L + + R H PL P 
Sbjct: 238 FPYSADQTLQFAVQKARKKNSTLVYLSATPPKELKRKALNGQLHSVRIPARHHRKPLPEP 297 

Query: 267 EMVWVSGIQKSLQTQKLPPKLYQLINKQRQTRYPLLLFFPHISEGQVFTEILRQAFPMEK 326 

VW +K L K+PP + + I + P+ LF P +S IL +A K 

Sbjct: 298 RFWCGNWKKKLNRNKIPPAVKRWIEFHVKEGRPVFLFVPSVS ILEKAAACFK 350 

Query: 327 IGFVSSKSTSRLKLVQDFRDNKLSILVSTTILERGVTFPSVDVFVIQANHHLFTK 381 

V ++ R + VQ FRD +L +L++TTILERGVT P V V+ A +FT+ 
Sbjct: 351 GVHCRTAS VHAEDKHRKEKVQQFRDGQLDLL I TTT I LERG VTVPKVQTGVLGAESS I FTE 410 

Query: 382 SSLVQISGRVGRALERPEGLLYFLHDGKSKSMHQAIKEIKNMNHI 426 

S+LVQI+GR GR E +G + + H GK+KSM A K IK MN + 
Sbjct: 411 SALVQIAGRTGRHKEYADGDVIYFHFGKTKSMLDARKHIKEMNEL 455 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3089> which encodes the amino acid 
sequence <SEQ ID 3090>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.09 Transmembrane 304 - 320 ( 303 - 322) 



Final Results 

bacterial membrane Certainty=0. 2635 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

!GB:U56901 involved in transformation [Bacillus subt... 258 le-67 

>GP:AAC44940 GB:U56901 involved in transformation [Bacillus subtilis] 
Identities = 155/435 (35%) , Positives = 249/435 (56%) , Gaps = 20/435 (4%) 



Query: 10 RLLLESQLPDSAKQLAQPLK SWILRGKMI CQRCHYQLDEEA RLPSG 56 



WO 02/34771 



PCT/GB01/04789 



-1116- 

R LIi ++L S + + +K S+ I + + C RC Q D+ 

Sbjct: 22 RHLLRTELSFSDEMIEWHIKNGYITAENSISINKRRYRCNRCG-QTDQRYFSFYHSSGKN 80 

Query: 57 AYYCRFCLVFGRNQSDKLLYAIPPMHFP- -KGNYLVWGGQLTAYQEMISQQLLINMQNQK 114 
5 YCR C++ GR + LY+ + K L W G+L++ Q+ + L+ + ++ 

Sbjct: 81 KLYCRSCVMMGRVSEEVPLYSWKEENES1OTK3IKLTWDGKLSSGQQKAANVLIEAISKKE 140 

Query: 115 TTLVHAVTGAGKTEMIYAAIEAVINTGGWVCIASPRVDVCVEVATRLSQAFS-CSICLMH 173 
L+ AV GAGKTEM++ IE+ +N G VCIA+PR DV +E+A RL AF I ++ 
10 Sbjct: 141 ELLIWAVCGAGKTEMLFPGIESALNQGLRVCIATPRTDVVLEIAPRLKAAFQGADISALY 200 

Query: 174 AESLPYQR-APIIVATTHQLLKFHKAFDLLIIDEVDAFPFVMNIQLHYAASQALKEGGAK 232 

S R +P++++TTHQLL++ A D++I1DEVDAFP+ + L +A +A K+ 
Sbjct: 201 GGSDDKGRLSPLMISTTHQLLRYKDAIDVMI IDEVDAFPYSADQTLQFAVQKARKKNSTL 260 

15 

Query: 233 ILLTATSTRTLERKVNKGEWKLTLARRFHNRPLVIPKFIRSFNLFKMIHRQKLPLKILK 292 

+ L+AT + L+RK G++ + + R H +PL P+F+ N K ++R K+P + + 
Sbjct: 261 WLSATPPKELKRKALNGQLHSWIPARHHRKPLPEPRFWCGNWKKKLNRNKIPPAVKR 320 

20 Query: 293 YLKKQRKTGYPLLIFLPTIIMAESVTAILKELLPAEQIACVSSQSQNRKEDITAFRQGKK 352 

+++ K G P+ +F+P++ + E A K + +AV++ ++RKE + FR G+ 
Sbjct: 321 WIEFHVKEGRPVFLFVPSVSILEKAAACFKGV--HCRTASVHAEDKHRKEKVQQFRDGQL 378 

Query: 353 TILITTSILERGVTFPQIDVFVLGSHHRVYSSQSLVQIAGRVGRSIDRPDGTLYFFHEGI 412 
25 +LITT+ILERGVT P++ VLG+ +++ +LVQIAGR GR + DG + +FH G 

Sbjct: 379 DLLITTTILERGVTVPKVQTGVLGAESSIFTESALVQIAGRTGRHKEYADGDVIYFHFGK 438 

Query: 413 SKAMLLARKEIKEMN 427 
+K+ML ARK IKEMN 
30 Sbjct: 439 TKSMLDARKHI KEMN 453 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 223/427 (52%) , Positives = 299/427 (69%) 

35 Query: 1 MENYLGRLWTKAQLSEQLRKIAISLPSFIKKGSDYICTRCSSSVAKNCQLPTGNYYCREC 60 

+EN GRL ++QL + +++A L S + IC RC + + +LP+G YYCR C 

Sbjct: 4 IENSYGRLLLESQLPDSAKQIAQPLKSWILRGKMICQRCHYQLDEEARLPSGAYYCRFC 63 

Query: 61 IVFGRVTSNENLYYFPQKTFSKTNSLKWKGELTPYQNEVSEELLKGISSKENLLVHAVTG 120 
40 +VFGR S++ LY P F K N L W G+LT YQ +S++LL + +++ LVHAVTG 

Sbjct: 64 LVFGRNQSDKLLYAIPPMHFPKGNYLWGGQLTAYQEMISQQLLINMQNQKTTLVHAVTG 123 

Query: 121 AGKTEMIYHSVAKVIDTGGSVCIASPRIDVCLELYKRLSNDFRCAITLMHGESPSYQRSP 180 
AGKTEMIY ++ VI+TGG VCIASPR+DVC+E+ RLS F C+I LMH ES YQR+P 
45 Sbjct: 124 AGKTEMIYAAIEAVINTGGWVCIASPRVDVCVEVATRLSQAFSCSICLMHAESLPYQRAP 183 

Query: 181 LTIATTHQLLKFYHAFDLLIVDEVDAFPY\7DNPILYQGVKQALKENGTSIFLTATSTTEL 240 

+ +ATTHQLLKF+ AFDLLI +DEVDAFP+V+N L+ QALKE G I LTATST L 
Sbjct: 184 I IVATTHQLLKFHKAFDLLI IDEVDAFPFVNNIQIiHYAASQALKEGGAKILLTATSTRTL 243 

50 

Query: 241 ERKVARKELKKLHLARRFHANPLVI PEMVWVSGIQKSLQTQKLPPKLYQL INKQRQTRYP 300 

ERKV + E+ KL LARRFH PLVIP+ + + K + QKLP K+ + + KQR+T YP 
Sbjct: 244 ERKVNKGEVVKLTLARRFHNRPLVIPKFIRSFNLFKMIHRQKLPLKILKYLKKQRKTGYP 303 

55 Query: 301 LLLFFPHISEGQVFTEILRQAFPMEKIGFVSSKSTSRLKLVQDFRDNKLSILVSTTILER 360 

LL+F PI + T IL++ P E+I VSS+S +R + + FR K +IL++T+ILER 
Sbjct: 304 LLIFLPTIIMAESVTAILKELIiPAEQIACVSSQSQNRKEDITAFRQGKKTILITTSILER 363 

Query: 361 GVTFPSVDVFVIQANHHLFTKSSLVQISGRVGRALERPEGLLYFLHDGKSKSMHQAIKEI 420 
60 GVTFP +DVFV+ ++H +++ SLVQI+GRVGR+++RP+G LYF H+G SK+M A KEI 

Sbjct: 364 GVTFPQIDVFVLGSHHRVYSSQSLVQIAGRVGRSIDRPDGTLYFFHEGISKAMLLARKEI 423 

Query: 421 KNMNHIG 427 
K MN+ G 

65 Sbjct: 424 KEMNYKG 430 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1006 

A DNA sequence (GBSxl066) was identified in S.agalactiae <SEQ ID 3091> which encodes the amino 
5 acid sequence <SEQ ID 3092>. This protein is predicted to be comf operon protein 3 (comFC). Analysis of 
this protein sequence reveals the following: 

Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0894 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

1 5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44942 GB:U56901 involved in transformation [Bacillus subtilis] 
Identities = 76/230 (33%) , Positives = 118/230 (51%) , Gaps = 11/230 (4%) 

Query: 1 MTCLLCHEIDLSQLTFVELMLLKPKQNVICQTCKGSFEALSREMGCQTCCK-QIPQKQCQ 59 
20 M CLLC +T+ L LLKP + V C +C+ + ++ + C C + Q C+ 

Sbjct: 1 MICLLCDSQFSQD VTWRALFLLKPDEKV- CYSCRSKLKKITGHI - CPLCGRPQS VHAVCR 58 

Query: 60 DCIYWGKKGIEV NHFSLYRYNEAMKKNFSLFKFQGDYLLKDVFTKEIKAALKKY- - 113 

DC W + + + S+Y YN+ MK+ S FKF+GD + + F + + K 

25 Sbjct: 59 DCEVVTOTRIRDSLLLRQNRSVYTYNDMMKETLSRFKFRGDAEIINAFKSDFSSTFSKVYP 118 

Query: 114 -KGYTIVPVPLSHEGYQNRQFNQyiAFLQSANlPYKNILSKKDGGEQSANNKEERLKQVQ 172 

K + +VP+PLS E + R FNQ + + P + L + + KQS K ERL 

Sbjct: 119 DKHFVLVPIPLSKEREEERGFNQAHLLftECIjDRPSHHPLIRLNNEKQSKKKKTERLLSEC 178 

30 

Query: 173 QFTLKNEAELGDNLLIVDDIYTTGATIAQIRKLLEEKG-IKNIKSFSLAR 221 

F KN + G N++++DD+YTTGAT+ + L EKG ++ SF+L R 
Sbjct: 179 IFDTKNNSAEGMNIILIDDLYTTGATLHFAARCLLEKGKAASVSSFTLIR 228 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 3093> which encodes the amino acid 
sequence <SEQ ID 3094>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 0763 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 100/222 (45%) , Positives = 139/222 (62%) , Gaps = 2/222 (0%) 

Query: 1 MTCLLCHEIDLSQLTFVELMLLKPKQNVICQTCKGSFEALSREMGCQTCCKQIPQKQCQD 60 
M CLLC +1 + ++ E++ L+ + ICQ C+ SF+ + + + C TCC C+D 
50 Sbjct: 1 MICLLCQQISQTPISITEIIFLRRISSPICQQCQKSFQKIGKSV-CATCCANSDIIACRD 59 

Query: 61 CIYWGKKGIEVNHFSLYRYNEAMKKNFSLFKFCGDYLLKDVFTKEIKAALKKY-KGYTIV 119 

C+ W KG VNH SLY YN AMK FS +KFQGDYLL+ VF E+ + KY KGY V 
Sbjct: 60 CLKWENKGYNVNHRSLYCYNAAI^KAYFSQYKFQGDYLLRJCVFAVELADVITKYYKGYIPV 119 



55. 



Query: 120 PVPLSHEGYQNRQFNQVIAFLQSANIPYKNILSKKDGGKQSANNKEERLKQVQQFTLKNE 179 

PVP+S ++ RQFNQV A L++AN+ Y ++ K D QS+ K+ERL + + L 
Sbjct: 120 PVPVSPGCFRERQFNQVSAILEAANVSYLSLFEKLDNTHQSSRTKKERLLVEKSYRLLKV 179 
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Query: 180 AELGDNLLITODIYTTGATIAQIRKLLEEKGIKNIKSFSLAR 221 

+ + D +LIVDDIYTTG+TI +RK L + +IKS S+AR 
Sbjct: 180 SNIPDKILIVT3DIYTTGSTIIALRKQLAKVANSDIKSLSIAR 221 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1007 

A DNA sequence (GBSxl067) was identified in S.agalactiae <SEQ ID 3095> which encodes the amino 
10 acid sequence <SEQ ID 3096>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .3889 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:CAB91549 GB:AJ249134 hypothetical protein [Lactococcus lactis] 

Identities = 107/185 (57%), Positives = 140/185 (74%), Gaps = 3/185 (1%) 

Query: 1 M I KYS IRGENI EVTFAIREYVETKLSKVEKYFNEAQELDTRVNLKVYREKTAKVE VT I L I 60 
MIK++IRGEN+EVT+AIR YVE K+ K++KYFN+ E+ VNLKVY EK AKVEVT+ 
25 Sbjct: 1 MIKinfllRGENVEVTDAIRAYvEDKIGKLDKYFra 60 

Query: 61 DS ITIiRAEDVSQDMYGS IDLVVDKIERQIRKNKTKIAKKYREKIPASQVFTTEFEAEPDE 120 

++TLRAED SQDMY SID V +K+ERQIRK KT++ +K R +P QVF EF 
Sbjct: 61 KNVTLRAEDTSQDMYSS IDFVEEKLERQIRKYKTRMNRKPR^VPTGQVFGDEFAPLDTT 120 

30 

Query: 121 EAVSQ RIWTKNVNIiKPMDVEEALLQMELLGHDFFIYTDAEDNTTNVLYKREDGELG 177 

+ V++ IVRTK+V LKPMD EEA+LQM++LGHDF+++TDA+ N T+V+Y+R DG G 
Sbjct: 121 DEVAEDHVDIWTKHVALKPMDAEEAVLQMDMLGHDFYVFTDADSNGTHVVYRRTDGRYG 180 

35 Query: 178 LIEAK 182 

LIE + 
Sbjct: 181 LIETE 185 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3097> which encodes the amino acid 
40 sequence <SEQ ID 3098>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 3751 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 145/182 (79%) , Positives = 165/182 (89%) 

Query: 1 MIKYSIRGENIEVTEAIREYWTKLSKVEKYFNEAQELDTRVNLKVYREKTAKVEVTILI 60 

MIK+SIRGENIEVTEAIR+YVE+KL+K+EKYF + QE+D RVNLKVYRE+++KVEVTI + 
Sbjct: 1 MIKFSIRGENIEVTEAIRDYWSKLTKIEKYFAKDQEIDARVI^KVYRERSSKVEVTIPL 60 



55 



Query: 61 DSITLRAEDVSQDMYGSIDLWDKIERQIRKNKTKIAKKYREKIPASQVFTTEFEAEPDE 120 

DS+TLRAEDVSQDMYGSIDLWDKIERQIRKNKTKIAKK+REK+P QVFTTEFEAE + 
Sbjct: 61 DSVTLRAEDVSQDMYGSIDLVVDKIERQIRKNKTKIAKKHREKVPTGQVFTTEFEAEEVD 120 
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Query: 121 EAVSQRIWTKmn^KPMDVEEALLQMELLGHDFFIYTDAEDNTTNVLYKREDGELGLIE 180 

E ++VRTKNV LKPMDVEEA LQMELLGHDFFIYTD+ED TN+LY+REDG LGLIE 
Sbjct: 121 EIPEVQVVRTKWVTLKPMDVEEARLQMELLGHDFFIYTDSEDGATNILYRREDGNLGLIE 180 

5 Query: 181 AK 182 

AK 

Sbjct: 181 AK 182 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 1008 

A DNA sequence (GBSxl068) was identified in S.agalactiae <SEQ ID 3099> which encodes the amino 
acid sequence <SEQ ID 3100>. Analysis of this protein sequence reveals the following: 

Possible site: 16 
15 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0685 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1009 

A DNA sequence (GBSxl077) was identified in S.agalactiae <SEQ ID 3101> which encodes the amino 
acid sequence <SEQ ID 3102> (sgaT). Analysis of this protein sequence reveals the following: 

Possible site: 41 
30 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.95 Transmembrane 99 - 115 ( 87 - 115) 
INTEGRAL Likelihood = -3.50 Transmembrane 43 - 59 ( 42 - 60) 

Final Results 

35 bacterial membrane Certainty=0. 3378 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:BAB03942 GB:AP001507 unknown conserved protein [Bacillus halodurans] 

Identities = 47/111 (42%) , Positives = 76/111 (68%) , Gaps = 5/111 (4%) 

Query: 1 MAIIYLIVAVFAG--EAYIAKEI SNGVNGLVYALQLAGQFAAGVFVILAGVRLILGE 55 

M I++L+ A+ + A+E+ S + +YA+ + FA G+ V+L GV++ +GE 

,45 Sbjct: 233 MGILFLVGAIILALKDTQGAQELIAQSGEQSFFIYAIIQSFMFAGGIAVVLLGVKMFIGE 292 

Query: 56 IVPAFKGISEKLVPNSKPALDCPIVYPYAPNAVLIGFISKFVGGLVSMIVM 106 

+VPAF GI+ KLVP ++PALD P+V+P APNAV++GF+ FVG L+ ++V+ 
Sbjct: 293 WPAFNGIATKLVPGARPALDAPWFPMAPNAVILGFLGAFVGALIWLWI 343 



50 



There is also homology to SEQ ID 516. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1010 

A DNA sequence (GBSxl078) was identified in S.agalactiae <SEQ ID 3103> which encodes the amino 
acid sequence <SEQ ID 3104>. This protein is predicted to be tryptophanyl-tRNA synthetase (trpS). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2156 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC05711 GB:L49336 tryptophanyl-tRNA synthetase [Clostridium 
longisporum] 

Identities = 225/340 (66%) , Positives = 271/340 (79%) , Gaps = 3/340 (0%) 



Query: 


1 


MTKPIILTGDRPTGKLHIGHYVGSLKNRVLLQNEGSYTLFVFLADQQALTDHAKDPQTIV 


60 






M K IILTGDRPTGKLHIGHYVGSLKNRV LQN G Y F+ +ADQQALTD+A++P+ I 




Sb j ct : 


1 


MAKEIILTGDRPTGKLHIGHYVGSLKNRVQLQNSGDYRSFIMIADQQALTDNARNPEKIR 


60 


Query: 


61 


ESIGlWALDYI^VGLDPNKSTLFIQSQIPELAELSMYyMNLVSIjARLERNPTVKTEIAQK 


120 






S+ VALDYLAVG+DP KST+ +QSQIPEL EL+M+Y+NLV+L+RLERNPTVK EI QK 




Sbjct: 


61 


NSLIEVALDYIiAVGIDPLKSTILVQSQIPELNELTMHYLNLVTLSRLERNPTVKAEIKQK 


120 


Query: 


121 


GFGESIPAGFLVYPVAQAADITAFKANLVPVGTDQKPMIEQTREITOSFiraAYNCQVLVE 


180 






F SIPAGFL+YPV+QAADITAFKA VPVG DQ PMIEQ REIVRSFN Y +VLVE 




Sb j ct : 


121 


NFENSIPAGFDIYPVSQAADITAFKATTVPVGEDQLPMIEQAREIVRSFNTIYGKEVLVE 


180 


Query: 


181 


PEGIYPENDAAGRLPGLDGNAKMSKSLNNGIFLADDMDTVKKKVMSMYTDPNHIKVEEPG 


240 






P+ + P+ GRLPG DG AKMSKS+ N I+LAD+ D +K+KVMSMYTDPNHIKV +PG 




Sb j ct : 


181 


PKAVIPKG-TIGRLPGTDGKAKMSKSIGNAIYLADEADVIKQKVMSMYTDPNHIKVTDPG 


239 


Query: 


241 


QIEGNMVFHYLDVFGRDEDQKEITAMKEHYQKGGLGDVICrKRYLLDILERELSPIRERRL 


300 






Q+EGN VF YLD F +D + E MK HY +GGLGDVK K++L +IL+ EL PIR RR 




Sb j ct : 


240 


QVEGNTVFTYLDTFCKDTETLE - - EMKAHYSRGGLGDVKVKKFLNE I LQAELEPI RNRRK 


297 


Query: 


301 


EYAKDMGQVYQMLQKGSEKAQAVAASTLDEVKSAMGLNYF 340 








E+ KD+ +VY++L++GSEKA+ VAA TL EV+ +G+ YF 




Sb j Ct : 


298 


EFQKDIPEVYRILKEGSEKAREVAAGTLKEVRETIGIEYF 337 





A related DNA sequence was identified in S. pyogenes <SEQ ID 3105> which encodes the amino acid 
sequence <SEQ ID 3106>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2737 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 290/340 (85%) , Positives = 316/340 (92%) 

Query: 1 MTKPIILTGDRPTGKLHIGHYVGSLKNRVLLQNEGSYTLFVFLADQQALTDHAKDPQTIV 60 
MTKPI ILTGDRPTGKLH+GHYVGSIiKNRV LQNE Y +FVFLADQQALTDHAK+ + I 



10 



15 



40 



45 



55 



60 
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MTKPIILTGDRPTGKLHLGHWGSLK33RVFLQNENKYKMFVFLADQQALTDHAKESELIQ 61 

ESIGlWALDYIiAVGLDPNKSTLFIQSQIPEIi^LSMYYM^njVSLRRLERNPTTOTEIAQK 120 
ESIGNVALDYL+VGLDP +ST+FIQSQIPELAELSMYYMNLVSLARLERNPTVKTEIAQK 
ES IGNVALDYLSVGLDPKQSTI FIQSQIPELftELSMYYMNLVSLARLERNPTVKTEIAQK 121 

GFGES I PAGFL VYPVAQAAD ITAFKANL VWGTDQKPMIEQTRE I VRS FNHAYNCQVLVE 180 
GFGES I P+GFLVYPV+QAAD ITAFKANLVPVG DQKPMIEQTREIVRSFNH Y+ LVE 
GFGESIPSGFLVYPVSQAADITAFKANLVPVGNDQKPMIEQTREIVRSFNHTYHTDCLVE 181 

PEGIYPENDAAGRLPGLDGNAKMSKSLIOTGIFIMDMDTVKKKVMSMYTDPNHIKVEEPG 240 
PEGIYPEN+ AGRLPGLDGNAKMSKSL NGI+L+DD DTV+KKVMSMYTDPNHIK+E+PG 
PEGIYPENEKAGRLPGLDGNAKMSKSLGNGIYLSDDADTVRKKVMSMYTDPNHIKIEDPG 241 

QIEGNMVFHYLDVFGRDEDQKEITAMKEHYQKGGLGDVKTKRYLLDILERELSPIRERRL 300 
QIEGNMVFHYLD+F R EDQ +1 AMKEHYQ GGLGDVKTKRYLLDILEREL+PIRERRL 



Sb j ct : 


2 


Query: 


61 


Sbjct: 


62 


Query: 


121 


Sb j ct : 


122 


Query: 


181 


Sbjct: 


182 


Query: 


241 


Sb j ct : 


242 


Query: 


301 


Sb j ct : 


302 



20 EYAKDMG+V++MLQ+GS+KA+ VAA TL EVKSAMG+NYF 

Sbjct: 302 EYAKDMGEVFRMLQEGSQKARTVAAKTLSEVKSAMGINYF 341 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 1011 

A DNA sequence (GBSxl079) was identified in S.agalactiae <SEQ ID 3107> which encodes the amino 
acid sequence <SEQ ID 3108>. This protein is predicted to be carbamate kinase. Analysis of this protein 
sequence reveals the following: 

Possible site: 24 
30 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:CAA04684 GB:AJ001330 carbamate kinase [Lactobacillus sakei] 
Identities = 199/311 (63%) , Positives = 254/311 (80%) , Gaps = 3/311 (0%) 

QKIWALGGNAILSTDASAKAQQEALINTSKSLVKLIKEGHDVIVTHGNGPQVGNLLLQQ 65 
+KIWALGGNAILSTDASA AQ +A+ T K LV +K+G +I++HGNGPQVGNLL+QQ 
RKIWALGGNAILSTDASANAQIKAVKETVKQLVAFVKQGDQLIISHGNGPQVGNLLIQQ 63 



AASDSEK PAMPLDT AM++G IG+W+QNA N L E+G+ +VAT+VTQ IVD KD+A 



50 F NPTKPIGPF SE +AKKQ + F EDAGRGWR+WPSP+P+GI+EA VI++LV+ 



Query: 


6 


Sb j ct : 


4 


Query: 


66 


Sbjct: 


64 


Query: 


126 


Sb j ct : 


124 


Query: 


185 


Sbjct: 


184 


Query: 


245 


Sb j ct : 


242 


Query: 


305 



V+ ISAGGGGVPV ++ N L+GVEAVIDKDFAS+ L+ELV AD+ I+LT VDNV+V 



NFNKP+Q+KL V+V++++ YI ++QFA GSMLPK++ AI +V N+P+S+AIITSL+N+ 
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N+LA +AGT I 
Sbjct: 302 NLLAHDAGTII 312 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3109> which encodes the amino acid 
5 sequence <SEQ ID 31 10>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 . 0013 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

15 Identities = 275/312 (88%) , Positives = 295/312 (94%) 

Query: 6 QKIWALGGNA1LSTDASAKAQQEALINTSKSLVKLIKEGHDVIVTHGNGPQVGNLLLQQ 65 

QKIWALGGNAILSTDASAKAQQFJiLI+TSKSLVKLIKEGH+VIVTHGNGPQVGNLLLQQ 
Sbjct: 4 QKIWALGGNAILSTDASAKAQQEALISTSKSLVKLIKEGHEVIVTHGNGPQVGNLLLQQ 63 



20 



Query: 66 AASDSEKNPAMPLDTCTAMTEGSIGFWLQNALNNELQEC^ 125 

AA+DSEKNPAMPLDTCVAMTEGSIGFWL NAL+NELQ QGI KEVA WTQVIVD KD A 
Sbjct: 64 AAADSEKNPAMPLDTCVAMTEGSIGFMjVlJALDNELQAC^IQKEVAAVvTQVIVDAKDPA 123 



25 Query: 126 FTNPTKPIGPFLSEEDAKKQAQETGSKFKEDAGRGWRKWPSPKPVGIKEASVIRRLVDS 185 

F NPTKPIGPFL+EEDAKKQ E+G+ FKEDAGRGWRKWPSPKPVGIKEA+VIR LVDS 
Sbjct: 124 FENPTKPIGPFLTEEDAKKQMAESGASFKEDAGRGWRKWPSPKPVGIKEANVIRSLVDS 183 

Query: 186 GVWISAGGGGVPVIEDANTKALKGVEAVIDKDFASQTLSELVDADLFIVLTC 245 
30 GVW+SAGGGGVPV+EDA +K L GVEAVIDKDFASQTLSELvDADLFIVLTGVDNV+VN 

Sbjct: 184 GVVWSAGGGGVPVVEDATSKTLTGVEAVIDKDFASQTLSELVDADLFIVLTGVDNVYVN 243 

Query: 246 FNKPNQEKLEEVTVSQMKQYITENQFAPGSMLPKVEAAIAFVENKPESRAIITSLENIDN 305 
FNKP+Q KLEEVTVSQMK+YIT++QFAPGSMLPKVEAAIAFVENKP ++AIITSLENIDN 
35 Sbjct: 244 FNKPDQAKLEEVTVSQMKEYITQDQFAPGSMLPKVEAAIAFVENKPNAICAI1TSLENIDN 303 

Query: 306 VLAQNAGTQIVA 317 

VL+ NAGTQI+A 
Sbjct: 304 VLSANAGTQIIA 315 

40 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1012 

A DNA sequence (GBSxl080) was identified in S.agalactiae <SEQ ID 3111> which encodes the amino 
45 acid sequence <SEQ ID 3112>. This protein is predicted to be permease (potE). Analysis of this protein 
sequence reveals the following: 

Possible site: 52 

»> Seems to have an uncleavable N-term signal seq 

50 



55 



INTEGRAL 


Likelihood 




•12. 


,63 


Transmembrane 


450 


- 466 


( 


441 




478) 


INTEGRAL 


Likelihood 




-8. 


.97 


Transmembrane 


236 


- 252 


( 


231 




259) 


INTEGRAL 


Likelihood 




-8. 


,70 


Transmembrane 


283 


- 299 


( 


277 




308) 


INTEGRAL 


Likelihood 




-8. 


.44 


Transmembrane 


165 


- 181 


( 


153 




186) 


INTEGRAL 


Likelihood 




-7. 


.96 


Transmembrane 


129 


- 145 


( 


126 




151) 


INTEGRAL 


Likelihood 




-6. 


,16 


Transmembrane 


396 


- 412 


( 


394 




415) 


INTEGRAL 


Likelihood 




-5. 


.15 


Transmembrane 


45 


- 61 


( 


38 




63) 


INTEGRAL 


Likelihood 




-4, 


.94 


Transmembrane 


335 


- 351 


( 


334 




352) 


INTEGRAL 


Likelihood 




-3. 


,72 


Transmembrane 


13 


- 29 


( 


10 




30) 


INTEGRAL 


Likelihood 




-2. 


.92 


Transmembrane 


417 


- 433 


( 


417 




435) 


INTEGRAL 


Likelihood 




-1. 


.54 


Transmembrane 


360 


- 376 


( 


360 




376) 
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INTEGRAL Likelihood = -0.53 Transmembrane 207 - 223 ( 207 - 223) 

Final Results 

bacterial membrane Certainty=0. 6052 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10295> which encodes amino acid sequence <SEQ ID 
10296> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA76779 GB:Y17554 permease [Bacillus licheniformis] 
Identities = 265/470 (56%), Positives = 347/470 (73%), Gaps = 3/470 (0%) 

Query: 5 MEKEKKLGLLPLTMLVIGSLIGGGIFDLMQNMSSRAGLVPMLIAWVITAIGMGTFVLSFQ 64 

M +EKKLGL L LVIGS+IGGG F+L +M+S AG +LI W+IT +GM SFQ 
Sbjct: 1 NEffiEKKlGLFALIALVIGSMIGGGAFNLASDMASGAGAGAILIGWIITGVGMIALAFSFQ 60 

Query: 65 NLSEKRPDLTAGIFSYAKEGFGNFMGFNSAWGYWLSAWLGNVAYAALLFSSLGYFFKFFG 124 

NL+ KRPDL GIF+YA+EGFG+FMGFNS WGYW +A LGNVAY LLFS++GYF FG 
Sbjct: 61 NLTTKRPDLDGGIFTYAREGFGHFMGFNSGWGYWFAALLGNVAYGTLLFSA1GYFIPAFG 120 

Query: 125 NGNNI IS I IGAS IVIWWHFLILRGVNTAAFINTI VTFAKLVPVI IFLI SALLAFKFNI F 184 

+G NI SIIGAS+++W VHFLILRGV +AA IN I T +KLVP+ F+I+ + F ++F 
Sbjct: 121 DGQNIASIIGASVILWCVHFLILRGVQSAAMINLITTISKLVPIFAFIIAIIFVFHLDLF 180 

Query: 185 SLDIWGNGLH-QSIFNQVNSTMKTAVWVFIGIEGAWFSGRAKKHSDIGKASILALFTMI 243 

+ D WG GL SI QV STM VWVF GIEGAV+FS RAKK SD+GKA+++ L +++ 
Sbjct: 181 TNDFWGKGLSLGSIGTQVKSTMLVTVWVPTGIEGAVLFSSRAKKSSDVGKATVIGLISVL 240 

Query: 244 SLYVLISVLSLGIMSRPELftNLKTPAMAYVLEKAVGHWGAILVNLGVIISVFGAII^WTL 303 

+YV+I++LSLG+M++ LA L P+MA ++E VG WGA+L+NLG+IISV GA LAWTL 
Sbjct: 241 VIYVMITMLSLGVMNQQNLAELPNPSMAAIMEHIVGKWGAVLINLGLIISVLGAWLAWTL 300 

Query: 304 FAAELPYQAAKEGAFPKFFAKENKNKAPINSLLVTNLCVQAFLITFLFTQSAYRFGFALA 363 

FA ELP AA+EG FPK+F KENKN AP N+L +TN +Q FL+TFL + +AY+F F+LA 
Sbjct: 301 FAGELPLIAAREGVFPKWFGKENKNGAPTNALTLTNAIIQLFLLTFLISDAAYQFAFSLA 360 

Query: 364 SSAILI PYAFTALYQLQFTLREDKSTPGHQKNLI IGILATIYA VYLI YAGGFDYLLLTMI 423 

SSAILIPY F+ LYQL+++ + P KNLIIGI+A+IY V+L+YA G DYLLLTMI 
Sbjct: 361 SSAILI PYLFSGLYQLKYSWLHKE - - PNRGKNLI IGI IAS I YGVWLVYAAGLDYLLLTMI 418 

Query: 424 AYTLGMILYIKMRKDDKLPIFVGYEKISAIVILALCLLCIIEIMTGQIDI 473 

Y G++++ +RK + P+F E + A +IL L ++ +1 + +G I I 
Sbjct: 419 LYAPGILVFRAVRKGKEGPVFNKAELLIAALILVLAVIAVIRLASGSISI 468 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3113> which encodes the amino acid 
sequence <SEQ ID 31 14>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
»> Seems to have no N-terminal signal sequence 



INTEGRAL 


Likelihood 
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Likelihood 
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66 


( 
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.59 
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( 
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INTEGRAL 


Likelihood 




-6. 


.21 
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INTEGRAL 


Likelihood 




-5. 


.84 


Transmembrane 
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( 


153 




183) 


INTEGRAL 


Likelihood 




-2. 


.02 


Transmembrane 


105 
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( 


105 
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INTEGRAL 


Likelihood 




-1. 


.49 
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414 
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( 


414 
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INTEGRAL 


Likelihood 




-0. 


.69 
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296 


( 
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INTEGRAL 


Likelihood 
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.59 
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Final Results 

bacterial membrane Certainty=0. 5607 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB85052 GB:AE000837 cationic amino acid transporter related 
protein [Methanobacterium thermoautotrophicum] 
Identities = 108/422 (25%) , Positives = 213/422 (49%) , Gaps = 36/422 (8%) 

Query: 26 INAVIGSGIFLLPRAIYKGLGPASIAVMFGTAILTiraAVCFAEVSGYFGKNGGAFQYSK 85 

+ ++G+ LGPASI ++ +++A+ F+E S + GG + Y+ 

Sbjct: 19 VGTIVGADIYI VAAYGAGSLGPASIIAWLIiAGLMALIIALVFSFASAMLPRTGGPYVYAG 78 

Query: 86 RAFGDFIGFNVGFLGWTVTIFAWAAMAAGFARMFIITFPAFEGWHIPL SIGL 137 

A G F GF GW++ + +W A+A +F + F + + IPL + 

Sbjct: 79 EALGRFTGF ITGWSLWVSSWVAIA VFPLAFIYYLEYFIPLDPPAEAVIKVLF 130 

Query: 138 IILLSLMNIAGLKTSKIvTITATIAKLIPI'VAFCACTIiFFIKNG LPNFTPFVQLEP 193 

1+ L+++NIAG+ + V TI K+ P++ F + + N+TP + 

Sbjct: 131 I LSLTI INI AGVGRAGKVND I LTI LKVAP VLLFA VLGAIHLALNPGLLVSNYTPAAPMG - 189 

Query: 194 GTNLLGAISNTAVYIFYGFIGFETLSIVAGEMRDPEKNVPRALLGSISIVSVLYMLIIGG 253 

LGA+ V +F+ ++GFE +++ A E+RDPE+ +P ++ + V++ Y+L 
Sbjct: 190 LGALGTVTVLVFWAYVGFEL VTVPADE VRDPERTI PLS I TLGMI FVTLFYI LTNA V 245 

Query: 254 TIAMLGSQIMMTN-APVQDAFVKMIGPAGAWMVSIGALISITGtNMGESIMVPRYGAAIA 312 

+ ++ +++ ++ AP+ A ++G GA +++ GA+ SI G + R A++ 

Sbjct: 246 ILGLVPWRVIASSTAPLTVAGYSLMGGIGALILTAGAVFSIAGSEEAGMLTTARLLFAMS 305 

Query: 313 DEGLLPAAIAKQNQN-GAPLVAILVSGAIAIVLLLTGSFESLAKLSWFRFFQYIPTALA 371 

++G LP +++ ++ G P ++ILV A++ LTG+ L +LSW Y T ++ 

Sbjct: 306 EDGFLPGFLSRVHRRFGTPHMSILVQNLTALLAALTGTVSGLIELSVvTLLLPYAvTCIS 365 

Query: 372 VNKLRKDDPDANVIFRVPFGPIIPII^VIVSLvMIWGDOTMNFVYGAVGVIIASSvYYLM 431 

+ LR+ D P+ +L V+V + ++ P +G + +I++ + YL+ 

Sbjct: 366 LAI LRRRDGSGI PLKSVLGVLVCIYLLMNTTPSTTAWGLL-LILSGAPLYLI 416 

Query: 432 HG 433 
G 

Sbjct: 417 FG 418 

Ah alignment of the GAS and GBS proteins is shown below. 

Identities = 104/368 (28%) , Positives = 162/368 (43%) , Gaps = 32/368 (8%) 

Query: 1 MRYKMEKEKKLGLLPLTMLVIGSLIGGGIFDLMQNMSSRAGLVPMLIAWVI-TAIGMGTF 59 

M + ++ K L T+ I ++IG GIF L + + GL P IA + TAI 
Sbjct: 6 MNEQEREQAKFSLSGATLYGINAVIGSGIFLLPRAIYK--GLGPASIAVMFGTAILTIML 63 

Query: 60 VLSFQNLSEKRPDLTAGIFSYAKEGFGNFMGFNSA WGYWLSAWLGNVAYAALLFSSL 116 

+ F +S G F Y+K FG+F+GFN W + AW A A +F 

Sbjct: 64 AVCFAEVSGYFGK-NGGAFQYSKRAFGDFIGFNVGFLGWTVTIFAWAAMAAGFARMFIIT 122 

Query: 117 GYFFKFFGNGNNIISIIGASIVIWVVHFLILRGVNTARFINTIVTFAKLVPVIIFLISAL 176 

F+ G +1 IG I++ +++ + G+ T+ + T AKL+P++ F L 
Sbjct: 123 FPAFE GWHIPLSIGLIILLSLMN IAGLKTSKIVTITATIAKLIPIVAFCACTL 175 

Query: 177 LAFK FNIFSLDIWGNGLHQSIFNQVNSTMKTAVWVFIGIEGAWFSGRAKKHSDI 231 

K F F G L +1 N TAV++F G G S A + D 

Sbjct: 176 FFIKNGLPNFTPFVQLEPGTNLLGAISN TAVYIFYGFIGFETLSIVAGEMRDP ,228 

Query: 232 GKASILALFTMISLYVLISVLSLG IMSRPELANLKTPAM-AYVLEKAVGHWGAILVN 287 

K AL IS+ ++ +L +G M ++ P A+V K +G GA +V+ 

Sbjct: 229 EKNVPRALLGSISIVSVLYMLIIGGTIAMLGSQIMMTNAPVQDAFV--KMIGPAGAWMVS 286 

Query: 288 LGVIISVFGAILAWTLFAAELPYQAAKEGAFPKFFAKENJCNKAPINSLLVTNLCVQAFLI 347 

+G +IS+ G + ++ A EG P AK+N+N AP+ ++LV+ L+ 

Sbjct: 287 IGALISITGLNMGESIMVPRYGAAIADEGLLPAAIAKQNQNGAPLVAILVSGAIAIVLLL 346 
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Query: 348 TFLFTQSA 355 

T F A 
Sbjct: 347 TGSFESLA 354 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9079> which encodes the amino 
acid sequence <SEQ ID 9080>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 4970 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS sequences follows: 
Score =62.1 bits (148), Expect = 2e-ll 

Identities = 59/250 (23%) , Positives = 107/250 (42%) , Gaps = 12/250 (4%) 



Query: 


143 


WGSYLKGLLAN- -YNIVLPNALNGTFNL- - KNGTYIDILPV-LVMFFVTGIVLMNSKLAL 


197 






WG +L L N Y +L ++L F 11+ +V++ V ++L A 




Sbjct: 


95 


WGYWLSAWLGNVAYAALLFSSLGYFFKFFGNGNNIISIIGASIVIWWHFLILRGVNTAA 


154 


Query: 


198 


RFNSFLVILKFSALALFIFVGIFFIDHNNWSHFAPYGVGQITGGKTGIFAGASVMFFAFL 


257 






N+ + K + +F+ + N +S +G G + + + F+ 




Sbjct: 


155 


FINTIVTFAKLVPVIIFLISALIAFKFNIFS-LDIWGNGLHQSIFNQvNSTMKTAVWVFI 


213 


Query: 


258 


GFESISMAVDEVKEPQKTIPKGIILSLIIVTALYIWTTILTGIV HYTKLNVPDAVA 


314 






G E + K+ IK IL+L + +LY++++ + GI+ L P A+A 




Sbjct: 


214 


GIEGAWFSGRAKK-HSDIGKASILALFTMISLYVLISvIaSLGIMSRPELANLKTP-AMA 


271 


Query: 


315 


FALRNIRLYWAADYVSIVAILTLITVCISWrYALARTIYSISRDGLLPKSLYTLTKKNKV 


374 






+ L +W A V++ I+++ ++ T A Y +++G PK + KNK 




Sbjct: 


272 


YvLEKAVGHWGAILVNLGVIISVFGAIIAWTLFAAELPYQAAKEGAFPK-FFAKENKNKA 


330 


Query: 


375 


PQNATLVTGL 384 








P N+ LVT L 




Sbjct: 


331 


PINSLLVTNL 340 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1013 

A DNA sequence (GBSxl081) was identified in S.agalactiae <SEQ ID 31 15> which encodes the amino 
acid sequence <SEQ ID 3116>. This protein is predicted to be unnamed protein product (argF). Analysis of 
this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3757 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3117> which encodes the amino acid 
sequence <SEQ ID 31 18>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.48 Transmembrane 171 - 187 ( 171 - 188) 

Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB12563 GB:Z99108 similar to metabolite transporter [Bacillus subtilis] 
Identities = 190/467 (40%) , Positives = 284/467 (60%) , Gaps = 13/467 (2%) 

Query: 25 TIFRKK KKYSNKTEMQRHFKVIDLVFLGLGSMVGTGIFTVTGIGAAKYAGPALTI 79 

++FRKK S + R DL LG+G ++GTGIF +TG AA AGPAL I 

Sbjct: 3 SLFRKKPLETLSAQSKSKSLARTLSAFDLTLLGIGCVIGTGIFVITGTVAATGAGPALII 62 

Query: 80 SI I ISAIAIGILALFYAEFASRMPSNGGAYSYVYATLGEFPAWLVGWYI IMEFLTAISSV 139 

S I++ +A + A YAEF+S +P +G YSY Y TLGE A+L+GW +++E++ A+S+V 
Sbjct: 63 SFILAGIACAIAAFCYAEFSSSIPIS6SVYSYSYVTLGELLAFLIGWDLMLEYVIALSAV 122 

Query: 140 AVGWGSYLKGLLANYNIVLPNAUSGTFNLKNGTYIDILPVLVMFFOTGIVLMNSKLALRF 199 

A GW SY + LLA +N+ +P AL G G ++ +++ +T IV K + RF 

Sbjct: 123 ATGWS SYFQSLLAGFNLH I PAALTGAPGSMftGAVFNLPAAVI ILLI TAI VSRGVKESTRF 182 

Query: 200 NSFLVILKFSALALFIFVGIFFIDHNNWSHFAPYGVGQITGGKTGIFAGASVMFFAFLGF 259 

N+ +V++K + + LFI VGI ++ +NWS F P+G+ G+ A+ +FFA+LGF 

Sbjct: 183 NNVIVLMKIAI ILLFI IVGIGYVKPDNWSPFMPFGM KGVI LSAATVFFAYLGF 235 

Query: 260 ESISMAVDEVKEPQKTIPKGIILSLIIVTALYIVVTTILTGIVHYTKLNVPDAVAFALRN 319 

+++S A +EVK PQK +P GII +L + T LYI V+ +LTG++ Y KLNV D V+FAL+ 
Sbjct: 236 DAVSNASEEVKNPQKNMPVGIISALAVCTVLYIAVSLVLTGMMPYAKLNVGDPVSFALKF 295 

Query: 320 IRLYWAADYVSIVAILTLITVCISMTYALARTIYSISRDGLLPKSLYTLTKKNKVPQNAT 379 

+ A +S+ AI+ + TV +++ YA R +++SRDGLLP + KPT 

Sbjct: 296 VGQDAVAGIISVGAIIGITTVMLALLYAQVRLTFAMSRDGLLPGLFAKVHPSFKTPFRNT 355 

Query: 380 LVTGLLAMICAGI FPLSSLAEFVNI CTLAYLI ILSGAI IKLRRIEGEPKANEFKTPLVPF 439 

+TG++A AG L +LA VN+ TLA ++S A+I LR+ E KA+ F+ P VP 
Sbjct: 356 WLTGIVAAGIAGFINLGTLAHLVNMGTLAAFTVISIAVIVLRKICHPEIKAS-FRVPFVPV 414 

Query: 440 LPMLAIIICLSFMSQYKAFTWIAFAIATIIGTLIYIiAYGYTHSIENK 486 

+P+++ ICL FM TW++F I +GTL+Y Y HS+ NK 

Sbjct: 415 VPIISAGICLWFMYSLPGVTWLSFVIWIAVGTLVYFLYSRKHSLLNK 461 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 312/337 (92%) , Positives = 324/337 (95%) 

Query: 1 MTQVFQGRSFLAEKDFSREEFEYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 60 

MTQVFQGRSFLAEKDF+R E EYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 
Sbjct: 1 MTQVFQGRSFLAEKDFTRAELEYLIDFSAHLKDLKKRGVPHHYLEGKNIALLFEKTSTRT 60 

Query: 61 RAAFTTAAIDLGAHPEYLGANDIQLGKKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 120 

RAAFTTAAIDLGAHPEYLGANDIQLGKKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 
Sbjct: 61 RAAFTTAAIDLGAHPEYLGANDIQLGKKESTEDTAKVLGRMFDGIEFRGFSQRMVEELAE 120 
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Query: 121 FSGVPV™GLTDEWHPTQMLADYLTIKENFGK1.EGITLOTCGDGRMNVANSLLVAGTLMG 180 

FSGVPVWNGLTDEWHPTQMLADY T+KENFGKLEGVTLVYCGDGRNNVANSLLV G ++G 
Sbjct: 121 FSGVPVWNGLTDEWHPTQMLADYFTOKENFGKLEGLTLWCGDGRNWANSLLVTGAILG 180 

Query: 181 VNVHIFSPKELFPAEEIVKLAEEYAKESGAHVLVTDlWDEAVKGADVFyTDVWVSMGEED 240 

VNVHIFSPKELFP EEIV LAE YAKESGA +L+T++ DEAVKGADV YTDVWVSMGEED 
Sbjct: 181 VNVHIFSPKELFPEEEIVTLAEGYAKESGARILITEDADEAVKGADVLYTDVWVSMGEED 240 

Query: 241 KFKERVELLQPYQVNMELIKKANNDNLIFLHCLPAFHDTmVYGKDVAEKFGVKEMEVTD 300 

KFKERVELLQPYQVNM+L++KA ND LIFLHCLPAFHDTNTVYGKDVAEKFGVKEMEVTD 
Sbjct: 241 KFKER VELLQPYQ VNMDLVQKAGNDKL I FLHCLPAFHDTNTVYGKDVAEKFGVKEMEVTD 300 

Query: 301 EVFRSKYARHFDQAENRMHTIKAVMAATLGNLFIPKV 337 

EVFRSKYARHFDQAEDJRMHTIKAVMAATLGNLFIPKV 
Sbjct: 301 EVFRSKYARHFDQAENRMHTI KAVMAATLGNLFI PKV 337 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1014 

A DNA sequence (GBSxl082) was identified in S.agalactiae <SEQ ID 3119> which encodes the amino 
acid sequence <SEQ ID 3120>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0456 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10921> which encodes amino acid sequence <SEQ ID 
10922> was also identified. 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3121> which encodes the amino acid 
sequence <SEQ ID 3122>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.41 Transmembrane 121 - 137 ( 118 - 140) 



Final Results 

bacterial membrane Certainty=0 . 3166 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 65/113 (57%) , Positives = 83/113 (72%) 

Query: 31 MEEEFDDNDEQDTIYAVLYDGKQPVSTGRFLPETQTEARLTRIATLKGYRGNGYGTKIII 90 

M ++FD NDE T+YAV+YD QPVSTG+FL ET+ EARLTRI TL Y G GYG K+ 
Sbjct: 1 MADKFDANDETRTVYAVWDNDQPVSTGQFIAETKIEARLTRIVTl^ADYCGCGYGAKVTE 60 

Query: 91 ALENYAKENGYHYLTIHAELTAKDFYQTLGYCATGNIYMEDGEACQTLEKYLI 143 

ALE Y + G++ LTIH+ELTA+ FY+ LGYQ+ G +EDGE CQ+L K ++ 
Sbjct: 61 ALETYTRREGFYQLTIHSELTAQTFYENLGYQSYGPKCLEDGEYCQSLAKTIL 113 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1015 

A DNA sequence (GBSxl083) was identified in S.agalactiae <SEQ ID 3123> which encodes the amino 
acid sequence <SEQ ID 3124>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2160 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3125> which encodes the amino acid 
sequence <SEQ ID 3126>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
15 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2730 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 375/411 (91%) , Positives = 395/411 (95%) , Gaps = 1/411 (0%) 

25 Query: 1 MTQTHPIHVFSEIGKLKKVMLHRPGKEIENLMPDYLERLLFDDIPFLEDAQKEHDAFAQA 60 

MT PIHV+SEIGKLKKV+LHRPGKEIENmPDYLERLLFDDIPFLEDftQKEHDAFAQA 
Sbjct: 1 MTAQTPIHVYSEIGKLKKVLLHRPGKEIENLMPDYLERLLFDDIPFLEDAQKEHDAFAQA 60 

Query: 61 LRNEGVEV1YLENLAAESLTNQEIREQFIDEYIGEANTOGRATKKAIRELLLNIKDNKEL 120 
30 LR+EG+EVLYLE LAAESL EIRE FIDEY+ EAN+RGRATKKAIRELL+ I+DN+EL 

Sbjct: 61 LRDEGIEVLYLETLAAESLVTPEIREAFIDEYLSEANIRGRATKKAIRELLMAIEDNQEL 120 

Query: 121 IEKTMAGIQKSELPEIPSSEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIGNGVSLNHM 180 
IEKTMAG+QKSELPEIP+SEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIG GVSLNHM 
35 Sbjct: 121 IEKTMAGVQKSELPEIPASEKGLTDLVESNYPFAIDPMPNLYFTRDPFATIGTGVSLNHM 180 

Query: 181 FSETRNRETLYGKYIFTHHPEYGG-KVPMVYEREETTRIEGGDELVLSKDVLAVGISQRT 239 

FSETRNRETLYGKYI FTHHP YGG KVPMVY+R ETTRIEGGDELVLSKDVLAVGISQRT 
Sbjct: 181 FSETRNRETLYGKYI FTHHPIYGGGKVPMVYDRNETTRIEGGDELVLSKDVLAVGISQRT 240 

40 

Query: 240 DAASIEKLLTOIFKQNLGFKKVLAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDLRV 299 

DAASIEKLLVNIFKQNLGFKKVLAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDLRV 
Sbjct: 241 DAASIEKLLWIFKQNLGFKKVIAFEFANNRKFMHLDTVFTMVDYDKFTIHPEIEGDIiRV 300 

45 Query: 300 YSVTYENQDLHIEEEKGDLADLLAKNLGVEKVELIRCGGDNLVAAGREQWNDGSNTLTIA 359 

YSVTY^-N++LHI EEKGDLA+LLA NLGVEKV+LIRCGGDNLVAAGREQWNDGSNTLTIA 
Sbjct: 301 YSVTYDNEELHIVEEKGDLAELIiAANLGvEKVDLIRCGGDNLVAAGREQWNDGSNTLTIA 360 

Query: 360 PGWIVYNRNTITNAILESKGLKLIKINGSELVRGRGGPRCMSMPFEREDL 410 
50 PGW+VYNRNTITNAILESKGLKLIKI+GSELVRGRGGPRCMSMPFERED+ 

Sbjct: 361 PGVVWYNRNTITNAILESKGLKLIKIHGSELVRGRGGPRCMSMPFEREDI 411 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1016 

A DNA sequence (GBSxl084) was identified in S.agalactiae <SEQ ID 3127> which encodes the amino 
acid sequence <SEQ ID 3128>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
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>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3162 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8703> which encodes amino acid sequence <SEQ ID 8704> 
was also identified. This protein has an RGD motif and has homology with the following sequences in the 
GENPEPT database. 

>GP:AAG07568 GB:AE004834 hypothetical protein [Pseudomonas aeruginosa] 
Identities = 42/132 (31%) , Positives = 74/132 (55%) , Gaps = 3/132 (2%) 

Query: 35 IQTYRKAYQTFKTK-KGARSSIEALLKRVNSGNEITSINPLVDIYNAASLRFGLPIGAED 93 

+ + +A++ F K + S EAL KR + SI+P+VD+YNA S++F +P+G E+ 

Sbjct: 63 LAAWAEAFRRFGAKPQRTPCSAFALRKRALRDGGIiPSIDPVVDLYNAISVQFAIPVGGEN 122 

Query: 94 SDTFRGDLKLTITNGGDEFYLI- -GEDFNRPTLSGELAYVDDVGAVCRCFNWRDGKRTMI 151 

+ G +L + +G + F + GE + GE+ + DD+G CR +NWR G RT + 

Sbjct: 123 LAAYAGPPRLWADGSETFDTLKNGFJ^DESPDPGEVVWRDDLGVTCRRWNWRQGVRTRL 182 

Query: 152 TDNTQNAFLVIE 163 

+ + + ++E 
Sbjct: 183 DASARRMWFILE 194 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3129> which encodes the amino acid 
sequence <SEQ ID 3130>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0700 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 127/199 (63%) , Positives = 155/199 (77%) 



Query: 


8 


ELKQLLSDSHSLAKKYLQEKEFSQNRVIQTYRKAYQTFKTKKGARSSIEALLKRVNSGNE 


67 






++KQLL+DSH LAK YL FS N+V+Q YRKAYQ FKTKKGARSSIEALLKRV++G 




Sb j ct : 


36 


DWQLIiADSHELAKAYLTADNFSDNQWQVYRKAYQHFKTKKGARSSIEALLKRVSNGQS 


95 


Query: 


68 


ITSINPLVDIYNAASLRFGLPIGAEDSDTFRGDLKLTITNGGDEFYLIGEDFNRPTLSGE 


127 






I SINPLVDIYNAASLRFGLP GAEDSD+F GDL+LTIT+GGD+FYLIG+ N PTL E 




Sb j ct : 


96 


IPSINPLVDIYNAASLRFGLPAGAEDSDSFIGDLRLTITDGGDDFYLIGDADNNPTLPNE 


155 


Query: 


128 


IiAYVDDVGAVCRCFNWPJDGKRTMITDNTQNAFLVIELIDNGREIIFKEALDFIATNTNRF 


187 






L Y DD+GA CRC NWRDG+RTM+T++T+NAFL+IE +D + +EAL FI + + 




Sb j ct : 


156 


LCYKDDIGAFCRCLNWRDGERTMVTEHTKNAFLIIEALDQEGQNRLQEALKFIEGSAKMY 


215 


Query: 


188 


LKAKTQTI ILDKEHSEITL 206 








L A T +LDK++ + L 




Sb j Ct : 


216 


LHAITSVHVLDKDNPHVPL 234 





SEQ ID 8704 (GBS298) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 44 (lane 2; MW 29kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 48 (lane 5; MW 54kDa). 
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The GBS298-GST fusion product was purified (Figure 203, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 297), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1017 

A DNA sequence (GBSxl085) was identified in S.agalactiae <SEQ ID 3131> which encodes the amino 
acid sequence <SEQ ID 3132>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3770 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1018 

A DNA sequence (GBSxl086) was identified in S.agalactiae <SEQ ID 3133> which encodes the amino 
acid sequence <SEQ ID 3134>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4263 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB95946 GB:Y17554 Crp/Fnr family protein [Bacillus 
lichenif ormis] 

35 Identities = 85/214 (39%) , Positives = 126/214 (58%) , Gaps = 14/214 (6%) 

Query: 11 RQLDDFKHFTIEQFDHIVSHIKHRTALKNHTLFFEGDYREKLFLIQSGHVKIEQSDASGS 70 

R L+D K F I R+ K IiF E D RE+++L+ G +K+E+S+ +GS 

Sbjct: 22 RDLEDMKQF IYWRSYHKGQILFMEDDPRERMYLLLDGFIKLEKSNEAGS 70 

40 

Query: 71 FIYTDYWQGTVFPYGGLFLDDDYHFSAVAITDIEYFSLPMALYEEYSLQNINQMKHLCR 130 

YTDYVR T+FP+GGLF D+ YH++A A+TDIE + +PM ++E+ N N + + 
Sbjct: 71 MFYTDYVRPHTLFPFGGLFRDEHYHYAAEALTDIELYYI PMNI FEDL vRDNKNLLYD I EN 130 

45 Query: 131 KYSKLLRVHEIRLRNMVTSSASMRVIQSLATIi LLQVPTERGHLPFPITTIEIANMSG 187 

S +L +HE RL+ + S A RV Q++ L L Q + + PIT EIA +SG 
Sbjct: 131 HLSDILALHEERLKRITLSHAHDRVTQAIYYLTESLGQKESNSTVINCPITAAEIAKISG 190 

Query: 188 TTRETVSHVLKELRQKDIVEMKGKKLLYNNKNYF 221 
50 T+RETVS VLK+LR + ++ K+++ N YF 

Sbjct: 191 TSRETVSAVLKKLRCEGVISQMNKQIMINRPEYF 224 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3135> which encodes the amino acid 
sequence <SEQ ID 3136>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4473 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 130/224 (58%) , Positives = 180/224 (80%) 

Query: 1 MITKEQYFYFRQLDDFKHFTIEQFDHIVSHIKHRTALKNHTLFFEGDYREKLFLIQSGHV 60 

+1 +E Y Y R+L+DF++F+IEQFD IV ++ R A K+H LFFEGD R+KLFL+ SG+ 
Sbjct: 1 VIRREDYQYLRKLNDFRYFSIEQFDKIVGQMEFRKAKKDHILFFEGDKRDKLFLVTSGYF 60 

Query: 61 KIEQSDASGSFIYTDYVRQGTVFPYGGLFLDDDYHFSAVAITDIEYFSLPMALYEEYSLQ 120 

K+EQSD SG+F+YTD++R GT+FPYGGLF DD YHFS VA+TD+ YF P+ L+E+YSL+ 
Sbjct: 61 KVEQSDQSGTFMYTDFIRHGTIFPYGGLFTDDYYHFSWAMTDVTYFYFPVDLFEDYSLE 120 

Query: 121 NINQMKHLCRKYSKLLRVHEIRLRNMVTSSASMRVIQSLATLLLQVPTERGHLPFPITTI 180 

N QMKHL K SKLL +HE+R+RN++TSSAS RVIQSLA LL+++ + LPF +TT 
Sbjct: 121 NRLQMKHLYSKMSKLLELHELRVRNLITSSASSRVIQSLAILLVEMGKDSDTLPFQLTTT 180 

Query: 181 E I ANMSGTTRE WSHVLKELRQKD I VEMRGKKLLYNNKNYFKKF 224 

+IA +SGTTRETVSHVL++L++++++ +KGK L Y +K+YF ++ 
Sbjct: 181 DIAQISGTTRETVSHVLRDLKKQELITIKGKYIjTYIiDKDYFLQY 224 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1019 

A DNA sequence (GBSxl087) was identified in S.agalactiae <SEQ ID 3137> which encodes the amino 
acid sequence <SEQ ID 3138>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1643 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2161> which encodes the amino acid 
sequence <SEQ ID 2162>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1201 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 461/493 (93%) , Positives = 478/493 (96%) 

Query: 2 MSNWDTKFLKKGFTFDDVLLI PAESHVLPNEvDMKTKLADNLTIiNI PI ITAAMDTVTDSK 61 
MSNWDTKFLKKG+TFDDVIjLIPAESHvIiPNEvD+KTKIADNLTLNIPIITAAMDTVT SK 
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Sb j Ct : 


1 


MSNWDTKFLKKGYTFDDVLLIPAESHVLPNEVDLKTKLADNLTLNIPIITAAMDTVTGSK 


60 


Query: 


62 


MAIAIARAGGLGIIHKNMSIVDQAEEVRKVKRSEN6VIIDPFFLTPDOTVSEAEELMQNY 


121 






MAIAI ARAGGLG+ 1 HKNMS I +QAEEVRKVKRSENGVI IDPFFLTP++ VSEAEELMQ Y 




Sbjct: 


61 


MAIAIARAGGLGVIHKNMSITEQAEEVRKVKRSENGVIIDPFFLTPEHKVSEAEELMQRY 


120 


Query: 


122 


RISGVPIVETLENRKLVGIITNRDMRFISDYKQLISEHMTSQNLVTAPIGTDLETAERIL 


181 






RISGVPIVETli NRKLVGI ITNRDMRF I SDY ISEHMTS++LVTA +GTDLETAERIL 




Sb j Ct : 


121 


RISGVPIVETLANRKLVGIITNRDMRFISDYNAPISEHMTSEHLVTAAVGTDLETAERIL 


180 


Query: 


182 


HEHRIEKLPLVDDEGRLSGLITIKDIEKVIEFPKAAKDEFGRLLVAGAVGVTSDTFERAE 


241 






HEHRIEKLPLVD+ GRLSGLITIKDIEKVIEFP AAKDEFGRLLVA AVGVTSDTFERAE 




Sb j ct : 


181 


HEHRIEKLPLVDNSGRLSGLITIKDIEKVIEFPHAAKDEFGRLLVAAAVGVTSDTFERAE 


240 


Query: 


242 


ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 


301 






ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 




Sb j ct : 


241 


ALFEAGADAIVIDTAHGHSAGVLRKIAEIRAHFPNRTLIAGNIATAEGARALYDAGVDW 


300 


Query: 


302 


KVGIGPGSICTTRWAGVGVPQITAIYDAAAVAREYGKTIIADGGIKYSGDIVKALAAGG 


361 






KVGIGPGSICTTRWAGVGVPQ+TAIYDAAAVAREYGKTIIADGGIKYSGDIVKAIAAGG 




Sb j ct : 


301 


KVGIGPGSICTTRWAGVGVPQVTAIYDAAAVAREYGKTIIADGGIKYSGDIVKALAAGG 


360 


Query: 


362 


NAVMLGSMFAGTDEAPGETEIFQGRKFKTYRGMGSIAAMKKGSSDRYFQGSVNEANKLVP 


421 






NA VMLGSMFAGTDEAPGETE I +QGRKFKTYRGMGS I AAMKKGS SDRYFQGS VNEANKLVP 




Sb j ct : 


361 


NAVMLGSMFAGTDEAPGETEIYQGRKFKTYRGMGSIAAMKKGSSDRYFQGSVNEANKLVP 


420 


Query: 


422 


EGIEGRVAYKGSVADIVFQMLGGIRSGMGYVGAflNIKELHDNAQFVEMSGAGLKESHPHD 


481 






EGIEGRVAYKG+ +DIVFQMLGGIRSGMGYVGA +I+ELH+NAQFVEMSGAGL ESHPHD 




Sb j ct : 


421 


EGIEGRVAYKGAASDIVFQMLGGIRSGMGYVGAGDIQELHENAQFVEMSGAGLIESHPHD 


480 


Query: 


482 


VQITNEAPNYSVH 4 94 








VQITNEAPNYSVH 




Sb j Ct : 


481 


VQITNEAPNYSVH 493 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1020 

A DNA sequence (GBSxl089) was identified in S.agalactiae <SEQ ID 3139> which encodes the amino 
acid sequence <SEQ ID 3140>. This protein is predicted to be MutR. Analysis of this protein sequence 
reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .1841 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD04237 GB:AF007761 MutR [Streptococcus mutans] 
Identities = 51/215 (23%) , Positives = 102/215 (46%) , Gaps = 9/215 (4%) 

Query: 5 GKILKELREDKGISLSSLAKSAQLSKSTLSRFENGETQIGIDKFIKALQTLEVGVTINEV 64 

G++ KELR +G+ L +A+ LS S LS+FENG+T + DK I A+Q + +T +E 
Sbjct: 9 GELYKELRMARGLKLKDIARD-NLSVSQLSKFENGQTMLAADKLILAIQGIH--MTFSEF 65 

Query: 65 SILDSKVKAGTSNTDLEQLTLLESYRDNEDIMRIFSFQKQQSCDRIESNVLKILAKLFIS 124 

S ++ + ++L L++ +D + + +1 + + + K++ K + 

Sbjct: 66 SYAFTQYQESDLFKTGKmVELQTKKDIKGLKKILKDYPDTETYNVYNRLNKLVIKAA.VY 125 

Query: 125 NLGLNMRLPQDEINLVVTY]^GVTQYNDFYFKVICTFQDILPED--VIIMKI SNMT 178 
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+L + + +E + +YL + ++ ++ + IL +D V L K + 

Sbjct: 126 SLDSSFEITNEEKEFLTSYLYAIEEWTEYELYLFGNTLFILSDDDLVFLGKAFVERDKLY 185 

Query: 179 KEQLPYSKSLVNLLIKQVIIALEKDSVDKAIVFAD 213 

+E + K +LI ++1 +E S A F + 
Sbjct: 186 RELSEHKKKAELVLINLILILVEHHSFYHAQYFIE 220 

There is also homology to SEQ ID 628. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1021 

A DNA sequence (GBSxl090) was identified in S.agalactiae <SEQ ID 3141> which encodes the amino 
acid sequence <SEQ ID 3142>. Analysis of this protein sequence reveals the following: 

Possible site: 15 

>>> Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 
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( 265 
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33 
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INTEGRAL 
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-6. 
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Transmembrane 
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( 176 
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INTEGRAL 
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.37 


Transmembrane 


117 


- 133 


( 113 


- 135) 


INTEGRAL 
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-5. 


.57 


Transmembrane 
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- 256 


( 232 


- 259) 
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56 


- 72 
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- 72) 



Final Results 

bacterial membrane Certainty=0 . 5310 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3143> which encodes the amino acid 
sequence <SEQ ID 3144>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>» Seems to have an uncleavable N-term signal seq 



INTEGRAL 


Likelihood 




•10 


.99 


Transmembrane 


269 


- 285 


( 


264 


- 286) 


INTEGRAL 


Likelihood 




-8. 


.76 


Transmembrane 


117 


- 133 


( 


112 


- 135) 


INTEGRAL 


Likelihood 




-7. 


.70 


Transmembrane 


179 


- 195 


( 


174 


- 200) 


INTEGRAL 


Likelihood 




-4. 


.83 


Transmembrane 


34 


- 50 


( 


32 


- 52) 


INTEGRAL 


Likelihood 




-4 


.46 


Transmembrane 


213 


- 229 


( 


211 


- 230) 


INTEGRAL 


Likelihood 




-4 


.14 


Transmembrane 


240 


- 256 


( 


232 


- 259) 


INTEGRAL 


Likelihood 




-0. 


.69 


Transmembrane 


91 


- 107 


( 


91 


- 108) 


INTEGRAL 


Likelihood 
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4 
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Final Results 

bacterial membrane Certainty=0 . 5394 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9181> which encodes the amino acid sequence 
<SEQ ID 9182>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 539 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/287 (69%) , Positives = 244/287 (84%) 

Query: 1 MEGLLIALIPMFAWGSIGFVSNKIGGRPNQQTFGMTLGALLFAIIWLFKQPEMTASLWI 60 
10 +EG+ ALIPMF WGSIGFVSNKIGG+P+QQT GMT GALLF++ VWL +PEMT LW+ 

Sbjct: 1 LEGIFYALIPMFTWGSIGFVSNKIGGKPSQQTLGMTFGALLFSLAVWLIVRPEMTLQLWL 60 

Query: 61 FGILGGILWSVGQNGQFQAMKYMGVSVANPLSSGAQLVGGSLVGALVFHEWTKPIQFILG 120 
FGILGG +WS+GQ GQF AM+YMGVSVANPLSSG+QLV GSL+G LVFHEWT+P+QF++G 
15 Sbjct: 61 FGILGGFIWSIGQTGQFHAMQYMGVSVANPLSSGSQLVLGSLIGVLVFHEWTRPMQFWG 120 

Query: 121 LTALTLLVIGFYFSSKRDVSEQALATHQEFSKGFATIAYSTVGYISYAVLFNNIMKFDAM 180 

AL LL++GFYFSSK+D + + FSKGF + YST+GY+ YAVLFNNIMKF+ + 

Sbjct: 121 SLALLLLIVGFYFSSKQDDANAQVlfflLHNFSKGFRALTYSTIGYvMYAVLFNNIMKFEVL 180 

20 

Query: 181 AVILPMAVGMCLGAICFMKFRWFFAVWKNMITGLMWGVGNVFMLLAAAKAGLAIAFSF 240 

+VILPMAVGM LGAI FM F+++ + V+KN + GL+WG+GN+FMLIAA+KAGLAIAFSF 
Sbjct: 181 SVILPMAVGMVLGAITFMSFKISIDQYVIKNSVVGLLWGIGNIFMLLARSKAGLAIAFSF 240 

25 Query: 241 SQLGVIISIIGGILFLGETKTKKEQKWVVMGILCFVMGAILLGIVKS 287 

SQLG 1 1 S I +GGI LFLGETKTKKE +WW GI+CF++GAILLG+VKS 
Sbjct: 241 SQLGAIISIVGGILFLGETKTKKEMRWWTGIICFIVGAILLGWKS 287 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 1022 

A DNA sequence (GBSxl092) was identified in S.agalactiae <SEQ ID 3145> which encodes the amino 

acid sequence <SEQ ID 3146>. This protein is predicted to be reef protein (recF). Analysis of this protein 

sequence reveals the following: 

35 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2653 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3147> which encodes the amino acid 
sequence <SEQ ID 3148>. Analysis of this protein sequence reveals the following: 

45 Possible site: 56 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1677 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 248/364 (68%) , Positives = 300/364 (82%) , Gaps = 1/364 (0%) 



55 



Query: 1 MWIKNISLKHYRNYEEAQvDFSPNLNIFIGRNAQGKTNFLEAIYFLALTRSHRTRSDKEL 60 

MWIK + LKHYRNY+ FS LN+FIG NAQGKTNFLEAIYFL+LTRSHRTR+DKEL 

Sbjct: 1 ^IKELELKHYRNYDHL^FSSGIJWFIGNNAQGKTNFLEAIYFLSLTRSHRTRADKEL 60 
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Query: 61 VHFKHHDVQITGEVIRKSGHLNLDIQLSEKGRITKVNHLKQAKLSDYIGAMTVVLFAPED 120 

+HF H V +TG++ R SG ++L+I LS+KGR+TK+N LKQAKLSDYIG M WLFAPED 
Sbjct: 61 IHFDHSTVSLTGKIQRISGTVDLEINLSDKGRVTKINALKQAKLSDYIGTMMVVLFAPED 120 

Query: 121 LQLVKGAPSLRRKFLDIDIGQIKPTYLAELSNYNHVLKQRNTYLKTTNlWDKTFLTVIiDE 180 

LQLVKGAPSLRRKF+DID+GQIKP YL+ELS+YNHVLKQRN+YLK+ +D FL VLDE 
Sbjct: 121 LQLVKGAPSLRRKFIDIDLGQIKPVYLSELSHYNHVLKQRNSYLKSAQQIDAAFLAVLDE 180 

Query: 181 QLADYGSRVIEHRFDFIQALNDEADKHHYI ISTELEHLS IHYKSS IEFTDKSS IREHFLN 240 

QLA YG+RV+EHR DFI AL EA+ HH IS LE LS+ Y+SS+ F K++I + FL+ 
Sbjct: 181 QLASYGARVMEHRIDFINALEKEANTHHQAISNGLESLSLSYQSSWFDKKTNIYQQFLH 240 

Query: 241 QLSKSHSRDIFKKNTSIGPHRDDITFFINDINATFASQGQQRSLILSLKLAEIELIKTVT 300 

QL K+H +D F+KNTS+GPHRD++ F+IN +NA FASQGQ RSLILSLK+AE+ L+K +T 
Sbjct: 241 QLEKNHQKDFFRKOTSVGPHRDELAFYINGMNANFASQGQHRSLILSLKMAEVSLMKALT 300 

Query: 301 NDYPILLLDDVMSELDNHRQLKLLEG-IKENVQTFITTTSLEHLSALPDQLKIFNVSDGT 359 

D PILLLDDVMSELDN RQ KLLE IKENVQTFITTTSL+HLS LP+ ++IF+V+ GT 
Sbjct: 301 GDNPILLLDDVMSELDKTRQTKLLETVIKENVQTFITTTSLDHLSQLPEGIRIFHVTKGT 360 

Query: 360 ISIN 363 
+ 1+ 

Sbjct: 361 VQID 364 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1023 

A DNA sequence (GBSxl093) was identified in S.agalactiae <SEQ ID 3149> which encodes the amino 
acid sequence <SEQ ID 3150>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1807 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA61548 GB:X89367 orfl21 [Lactococcus lactis] 
Identities = 56/116 (48%) , Positives = 74/116 (63%) , Gaps = 3/116 (2%) 

Query: 3 YKLFDEYITLQSLLKEIGI IQSGGAIKKFLADNR- - VLFNGDLENRRGKKLRLGDIITIP 60 

Y LF+EYITL LLKE+G+I +GG K FLA+N + +NG+ ENRRGKKIiR GD++ P 
Sbjct: 4 YILFEEYITLGQLLKELGLISTGGQPKIFIAENEGNIFYNGEAENRRGKKLRDGDLLEFP 63 

Query: 61 DQNIEIIIRKPSDQEIEERNIEIAEKQRVSAIVKEMNKNTNKGKSKTSKKPVRFPG 116 

++++ + i+e E AE+ RV AIVK+MN NK K P RFPG 

Sbjct: 64 TFDLKVTFEQADADAIKEHEAEKAEEARVKAIVKKMNAE-NKTTKPAKKAPPRFPG 118 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3151> which encodes the amino acid 
sequence <SEQ ID 3152>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 0493 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 74/136 (54%) , Positives = 94/136 (68%) , Gaps = 20/136 (14%) 

Query: 1 MDYKLFDEYITLQSLLKEIGIIQSGGAIKKFLADNRVLFNGDLENRRGKKLRLGDIITIP 60 
5 M YKLF E+ ITLQ+LLKE+GI IQSGGAIK FLA+ VLFNG+ E RRGKK+R+GD I++P 

Sbjct: 9 MIYKLFTEFITLQALLKELGIIQSGGAIKGFLAETTVLFNGEDEKRRGKKIRVGDKISLP 68 

Query: 61 DQNI E 1 1 IRKPSDQE IEERNI E I AEKQRVSAIVKEMNKNTNKGKSK TSKK 110 

DQ++ I I +PS +E E+ E+AEK RV+A+VK+MN+ K SK T+KK 
10 Sbjct: 69 DQDLIITIVEPSQEEKEQFAEEMAEKTRVAALVKQMNQANKKTSSKHNNRQSTTKKSLRA 128 

Query: 111 PVRFPG 116 

PVRFPG 

Sbjct: 129 TKKTKGKPTAPVRFPG 144 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1024 

A DNA sequence (GBSxl094) was identified in S.agalactiae <SEQ ID 3153> which encodes the amino 
20 acid sequence <SEQ ID 3154>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.86 Transmembrane 269 - 285 ( 267 - 285) 

25 Final Results 

bacterial membrane Certainty=0. 1744 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 3155> which encodes the amino acid 
sequence <SEQ ID 3156>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3008 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 227/413 (54%) , Positives = 309/413 (73%) 

Query: 1 MKIVEGVSLHLIKNQQFKTNHLTFRFSGDFNNKTVARRSLVAQMLVTANAKYPKVQEFRE 60 
MKIV+GV LHLIK +QFKTNH+TFRFSGD N KTVA++ LVAQML TAN YP V++FRE 
45 Sbjct: 1 MKIVQGVQLHLIKTKQFKTNHITFRFSGDIjNQKTvAKKvLVAQMI^TANECYPTVRQFRE 60 

Query: 61 KLASLYGASLSTKISTKGLWIVDIDIVFVKNTFTLEQENIVEQIITFLEDMLFSPLISL 120 

KLA LYGASLST + TKGLVHIVDIDI F+++ + E I++++I FL+D+LFSPL+S+ 
Sbjct: 61 KLARLYGASLSTNVLTKGLVHIVDIDITFIQDRYACNGEKILDEMIQFLKDILFSPLLSI 120 



50 



Query: 121 EQYQTSIFDTEKKNLIQYLEADIEDNFYSSDLALKSLFYNNKTLRLPKYGTASLVESENS 180 

QYQ +F+TEK NLI Y+E+D ED+FY S L +K LFY NK L++ +YG+ L+ E + 
Sbjct: 121 AQYQPKVFETEKNNLINYIESDREDSFYYSSLKVKELFYCNKNLQMSEYGSPELIAKETA 180 



55 Query: 181 FTAYQEFQKMLKEDQLDIFWGDFDDYRMIQAFNRMAFEPRHKVLAFDYTQTYENITRSQ 240 

+T+YQEF KML EDQ+DI F++GDFDDYR++Q ++ + R+K L F + Q NI + 
Sbjct: 181 YTSYQEFHKMLNEDQIDIFILGDFDDYRWQLIHQFPLDNRNKNLNFFHLQNSVNIIKES 240 



Query: 241 VEDKDVNQSIMQLAYHLPITYKDEDYFALIVFNGLFGAFAHSLLFTEIREKQGLAYTIGS 300 
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+E + V+QSI+QLAYH P + DY+AL++ NGL G+FAHS LF +IRE++GLAY+IG 
Sbjct: 241 IEKRAVHQSILQIAYHFPSVFGQRDYYALVLLNGLLGSFAHSRLFIKIREEEGLAYSIGC 300 



5 



Query: 301 QFDSFTGLFTIYAGIDKEl^ERFLKLIKKQFNNIKMGRFSSTLLKQTKDILKMNYVIASD 360 

+FDS+TGLF IY GID ++R + L+LI ++ N IKMGRFS L+K+T+ +L N +L+ D 
Sbjct: 301 RFDSYTGLFEIYTGIDSQHRTKTLQIilIQEimiKMGRFSEQLIKKTRSMLIJSINALLSED 360 



10 



Query: 361 NPKVIVDHIYHEHYLDQFHTSALFIDKVDDVTKSDIVSVATKLKLQAFYFLEG 413 

K I++ IY Y+D ++ +1 V++V K+DI+ VA LKLQ YFLEG 
Sbjct: 361 YNKNIIERIYRSSYIDSSYSIK1WIKGVNEVMKADIIKVANLLKLQTVYFLEG 413 



SEQ ID 3154 (GBS400) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 2; MW 49.2kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 3; MW 74kDa) and in Figure 
15 177 (lane 6; MW 74kDa). 

GBS400-GST was purified as shown in Figure 217, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1025 

20 A DNA sequence (GBSxl095) was identified in S.agalactiae <SEQ ID 3157> which encodes the amino 
acid sequence <SEQ ID 3158>. Analysis of this protein sequence reveals the following: 



Possible site: 50 

>» Seems to have no N-terminal signal sequence 



25 



Final Results 



bacterial cytoplasm — Certainty=0. 3473 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



30 A related DNA sequence was identified in S.pyogenes <SEQ ID 3159> which encodes the amino acid 
sequence <SEQ ID 3160>. Analysis of this protein sequence reveals the following: 



Possible site: 45 

»> Seems to have no N-terminal signal sequence 



35 



Final Results 



bacterial cytoplasm Certainty=0. 4298 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 207/424 (48%) , Positives = 276/424 (64%) , Gaps = 3/424 (0%) 



45 



Query: 5 KITYQNLQEEWKLTLESGLNVYLIPKPSFKETVGVLTANFGSLHTKYTRNGCVEHYPAG 64 

KI Y N+ E++Y + LE+GL VY I K F E +LT FGSL K T + PAG 
Sbjct: 6 KINYPNIDEDLYYVKLENGLTVYFIKKIGFIjEKTAMLTVGFGSLDNKLTVDDESRDAPAG 65 



Query: 65 IAHFLEHKLFELDKGQDAATQFTKYGAESNAFTTFDKTSFYFSTISHiraCLDILLDFVL 124 

IAHFLEHKLFE + G D + +FT+ GAE+NAFTTF++TSF+FST S L++L FVL 

Sbjct: 66 IAHFLEHKLFEDESGGDISLKFTQLGAETNAFTTFNQTSFFFSTASKFQENLELLQYFVL 125 



50 



Query: 125 TTNFTEESITKEKDIIKQEIEMYQDDPEYRLYQGVLSNLYPNSPLAFDIAGDYQSISQIT 184 

+ N T+ES+++EK II QEI+MYQDD +YR Y G+L NL+P + LA DIAG SI +IT 
Sbjct: 126 SANITDESVSREKKIIGQEIDMYQDDADYRAYSGILQNLFPKTSLANDIAGSKASIQKIT 185 



55 Query: 185 LTDLQENHKDFYQLSNMNLVLVGQFSPQEIITYLQKNSHFTSY--SQNIDRDSISLEPVI 242 

L+ +H FYQ +NM+L +VG E +Q+ SY + + D + PVI 
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Sbjct: 186 KILLETHHTYFYQPTNMSLFIVGDIDIDETFLAIQRFQTTIiSYPDRKRVTVDPLHYYPVI 245 

Query: 243 KIMSCHMTVTKPKliAIGYRKSNHMIHGSYLKEKIGLQLFFMLLGVWSTINQDWYESGQI 302 

K++S M VT KL +G+R + S L +1 L+LF +ML+GWTS I YE G+I 
Sbjct: 246 KSSSVDMDVTTAKLWGFRGYLTLTQHSLLTYRIALKLFLSMLIGWTSKIYHTLYEDGKI 305 

Query: 303 DDSFDIEIEVHPDFECVIISLDTTEPIAFSTQLRLLLKNALQSSDLTESHLKNVKRELYG 362 

DDSFD+++E+H +F+ V+ISLDT EPIA S +R Ii S + T HL +K+E+YG 

Sbjct: 306 DDSFDVDVEIHHNFQE^ISLDTPEPIAMSNYIRQKIATIKISKEFTlffiHni^LKKEMYG 365 

Query: 363 DFLRSLDSIENLAMQFVTYLYDG-KTMYLDLPSIVEELDLEDVITIGKDFLiDNADTSDFV 421 

DF++SLDSIE4L QF YL D K Y D+P I+E L L+DV+TIGK F + AD SDF 
Sbjct: 366 DFIQSLDSIEHLTHQFSLYLSDSDKETYFDIPKIIERLTLKDWTIGKAFFEKADASDFT 425 

15 Query: 422 IFPK 425 

+FPK 

Sbjct: 426 VFPK 429 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 1026 

A DNA sequence (GBSxl096) was identified in S.agalactiae <SEQ ID 3161> which encodes the amino 
acid sequence <SEQ ID 3162>. This protein is predicted to be phosphotidylglycerophosphate synthase 
(pgsA). Analysis of this protein sequence reveals the following: 

25 Possible site: 55 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.17 Transmembrane 17 - 33 ( 14 - 39) 

INTEGRAL Likelihood = -3.77 Transmembrane 92 - 108 ( 88 - 108) 

INTEGRAL Likelihood = -2.87 Transmembrane 144 - 160 ( 142 - 162) 

30 INTEGRAL Likelihood = -1.65 Transmembrane 42 - 58 ( 42 - 59) 

Final Results 

bacterial membrane Certainty=0 .4270 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10293> which encodes amino acid sequence <SEQ ID 
10294> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3163> which encodes the amino acid 
40 sequence <SEQ ID 3164>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -6.64 Transmembrane 76 - 92 ( 72 - 102) 
INTEGRAL Likelihood = -5.36 Transmembrane 136 - 152 ( 131 - 164) 
45 INTEGRAL Likelihood = -2.34 Transmembrane 98 - 114 ( 97 - 114) 

Final Results 

bacterial membrane Certainty=0 . 3654 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 145/180 (80%) , Positives = 160/180 (88%) 

55 Query: 8 I^KKENIPl^LTVVRILMIPLFIVLTSVTTSTTWHIVAAIVFAIASLTDYLDGYLARKWQ 67 

M+KKENI PNLLT+VRI MIP F+ +TS + WHI AA++FAIAS TDYLDGYLARKW 
Sbjct: 1 MIKKENIPNLLTLVRIAMIPFFLFITSSSNKVGI«JHIFAAVIFAIASFTDYLDGYLARKWH 60 
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Query: 68 VVTNFGKFADPLADKMLVMSAFIMLVGLDl^PAWVSAIIICREIAWGLRLLLVETGGTV 127 

V +NFGKFADPIADKMLVMSAFIMLVGL L PAWSA+IICREIAVTGLRLLLVETGG V 
Sbjct: 61 VASNFGKFADPLADKMLVMSAFIMLVGLGLVPAWVSAVIICRELAVTGLRLLLVETGGKV 120 

5 Query: 128 LAARMPGKIKTATQMFAVIFLLVHWMTLGNIMLYIALFFTLYSGYDYFKGAGFLFKDTFK 187 

LAAAMPGKI KTATQM ++I LL HW+ LGN++LYIALFFT+YSGYDYFKGA FLFKDTFK 
Sbjct: 121 lAATiMPGKIKTATQMLSIILLLCHWIFLGNVLLYIALFFTIYSGYDYFKGASFLiFKDTFK 180 

A related GBS gene <SEQ ID 8705> and protein <SEQ ID 8706> were also identified. Analysis of this 
1 0 protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 4 
SRCFLG: 0 

McG: Length of OR: 9 

Peak Value of UR: 3.03 
15 Net Charge of CR: 1 

McG: Discrim Score: 6.36 

GvH: Signal Score (-7.5): -0.400001 

Possible site: 48 
»> Seems to have a cleavable N-term signal seq. 
20 Amino Acid Composition: calculated from 49 

ALOM program count: 2 value: -3.77 threshold: 0.0 

INTEGRAL Likelihood = -3.77 Transmembrane 85 - 101 ( 81 - 101) 
INTEGRAL Likelihood = -2.87 Transmembrane 137 - 153 ( 135 - 155) 
PERIPHERAL Likelihood = 1.27 109 
25 modified ALOM score: 1.25 

icml HYPID: 7 CFP: 0.251 

*** Reasoning Step: 3 

30 Final Results 

bacterial membrane Certainty=0 .2508 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

35 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1027 

A DNA sequence (GBSxl097) was identified in S.agalactiae <SEQ ID 3165> which encodes the amino 
acid sequence <SEQ ID 3166>. This protein is predicted to be ABC transporter ATP-binding protein 
40 (potA). Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

45 bacterial cytoplasm Certainty=0 . 1805 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAC61484 GB:AF082738 ABC transporter ATP-binding protein 

[Streptococcus pyogenes] 
Identities = 201/279 (72%) , Positives = 231/279 (82%) 

Query: 1 MTNIITVNNLFFKYDSNQTOTQLENVSFHVKCGF^SIIGHNGSGKSTTVRLIDGLLEAE 60 
55 M+ II + + F Y +Q L+ VSFHVKQGEWLSI IGHNGSGKSTT+RLIDGLLE E 

Sbjct: 18 MSAIIELKKVTFNYHKDQEKPTLDGVSFHVKQGEWLSIIGHNGSGKSTTIRLIDGLLEPE 77 



Query: 61 SGQIIIDGQELTEDNVWELRHKIGMVFQNPDNQFVGATVEDDVAFGLENKGIPLKDMKER 120 
SG II+DG LT NVWE+RHKIGMVFQNPDNQFVGATVEDDVAFGLENKGI +D+KER 
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Sbjct: 78 SGSIIVDGDLLTITNVWEIRHKIGMWQNPDNQFVGATVEDDVAFGLENKGIAHEDIKER 137 

Query: 121 VTJQALDLVGMSEFKMREPARLSGGQKQRVAIAGAVAMRPQVIILDEATSMLDPEGRLELI 180 

V+ AL+LVGM FK +EPARLSGGQKQRVAIAGAVAM+P++IILDEATSMLDP+GRLELI 
Sbjct: 138 VNHALELVGMQNFKEKEPARLSGGQKQRVAIAGAVAMKPKI I ILDEATSMLDPKGRLELI 197 

Query: 181 RTIRAIRQKYNLTVISITHDLDEVALSDRV1VMKNGKVESTSTPKALFGRGNRLISLGLD 240 

+TI+ IR Y LTVISITHDLDEVALSDRV+VMK+G+VESTSTP+ LF RG+ L+ LGLD 
Sbjct: 198 KTIKNIRDDYQLWISITHDLDEVALSDRVLWKDGQVESTSTPEQLFARGDELLQLGLD 257 

Query: 241 VPFTSRLMAELAANGLDIGTEYLTEKELEEQLWEIiNLKM 279 

+PFT+ ++ L G I YLTEKELE QL +L KM 
Sbjct: 258 IPFTTSWQMLQEEGYPIDYGYLTEKELENQLCQLISKM 296 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3167> which encodes the amino acid 
sequence <SEQ ID 3168>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
>>> Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 2235 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

25 RGD motif: 247-249 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/279 (71%) , Positives = 231/279 (82%) 

30 Query: 1 MTOIITVNNLFFKYDSNQTHYQLENVSFHVKQGEWLSIIGHNGSGKSTTVRLIDGLLEAE 60 

M+ II + + F Y +Q L+ VSFHVKQGEWLSIIGHNGSGKSTT+RLIDGLLE E 
Sbjct: 18 MSAIIELKKVTFNYHKDQEKPTLDGVSFHVKQGEWLSIIGHNGSGKSTTIRLIDGLLEPE 77 

Query: 61 SGQIIIDGQELTEDNVWELRHKIGMVFQNPDNQFVGATVEDDVAFGLENKGIPLKDMKER 120 
35 SG II+DG LT NVWE+RHKIGMVFQNPDNQFVGATVEDDVAFGLENKGI +D+KER 

Sbjct: 78 SGSIIVDGDLLTITNVWEIRHKIGMVFQNPDNQFVGATVEDDVAFGLENKGIAHEDIKER 137 

Query: 121 VDQALDLVGMSEFKMREPARLSGGQKQRVAIAGAVAMRPQVIILDEATSMLDPEGRLELI 180 
V+ AL+LVGM FK +EPARLSGGQKQRVAIAGAVAM+P++IILDEATSMLDP+GRLELI 
40 Sbjct: 138 VNHALELVGMQNFKEKEPARLSGGQKQRVAIAGAVAMKPKI I ILDEATSMLDPKGRLELI 197 

Query: 181 RTIRAIRQKYNLTVISITHDLDEVALSDRVIVMKNGKVESTSTPKALFGRGNRLISLGLD 240 

+TI+ IR Y LTVISITHDLDEVALSDRV+VMK+G+VESTSTP+ LF RG+ L+ LGLD 
Sbjct: 198 KTIKNIRDDYQLTVISITHDLDEVALSDRVLVMKDGQVESTSTPEQLFARGDELLQLGLD 257 

45 

Query: 241 VPFTSRLMAELAANGLDIGTEYLTEKELEEQLWELNLKM 279 

+PFT+ ++ L G + YLTEKELE QL +L KM 
Sbjct: 258 IPFTTSWQMLQEEGYPVDYGYLTEKELENQLCQLISKM 296 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1028 

A DNA sequence (GBSxl098) was identified in S.agalactiae <SEQ ID 3169> which encodes the amino 
acid sequence <SEQ ID 3170>. Analysis of this protein sequence reveals the following: 

55 possible site: 49 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 154 - 170 ( 154 - 170) 



60 



Final Results 

bacterial membrane Certainty=0. 1107 (Affirmative) < suco 
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bacterial outside Certainty=0 . 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11922 GB:Z99104 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 141/242 (58%) , Positives = 188/242 (77%) , Gaps = 1/242 (0%) 

TPFEGRALFDTOLKIEDASYTAFIGHTGSGKSTIMQLMGLHIPTKGEVIVDDFSIKAGD 75 
TPFE AL+D+N I++ SY A IGHTGSGKST++Q LNGL PTKG++ + I+AG 
TPFERLALYDINASIKEGSYVAVIGHTGSGKSTLLQHIjNGLLKPTKGQISLGSTVIQAGK 62 

KNKEIKFIRQKVGLVFQFPESQLFEETVLKDVAFGPQNFGISQIEAERIAEEKLRLVGIS 135 
KNK++K +R+KVG+VFQFPE QLFEETVLKD++FGP NFG+ + +AE+ A E L+LVG+S 
KNKDLKKLRKKVGIVFQFPEHQLFEETVLKDISFGPMNFGVKKEDAEQKAREMLQLVGLS 122 



E+L D++PFELSGGQMRRVAIAG+LAM+P+VLVLDEPTAGLDP+GRKE+M +F LH++G 



+T +LVTH M+D A YAD + V+ G + SG P+ +F + E + L +P+ KF + 



Query: 


16 


Sbjct: 


3 


Query: 


76 


Sbjct: 


63 


Query: 


136 


Sbjct: 


123 


Query: 


196 


Sbjct: 


183 


Query: 


255 


Sb j ct : 


243 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3171> which encodes the amino 
sequence <SEQ ID 3172>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 154 - 170 ( 154 - 170) 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAB11922 GB:Z99104 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 146/259 (56%) , Positives = 187/259 (71%) , Gaps = 2/259 (0%) 

TPFEGRALFNINLDILDGSYTAFIGHTGSGKSTIMQLLNGLHVPTTGIVSVDKQDITNHS 7 5 
TPFE AL++IN I +GSY A IGHTGSGKST++Q LNGL PT G +S+ I 
TPFERLALYDINASIKEGSYVAVIGHTGSGKSTLLQHLNGLLKPTKGQISLGSTVIQAGK 62 

KNKEIKSIRKHVGLVFQFPESQLFEETVLKDV7AFGPQNFGVSPEEAFALAREKLALVGIS 13 5 
KNK++K +RK VG+VFQFPE QLFEETVLKD++FGP NFGV E+AE ARE L LVG+S 
KNKDLKraRKKVGIVFQFPEHQLFEETVLKDISFGPMNFGVKKEDAEQKAREMLQLVGLS 122 



E L + + + PFELSGGQMRR VAIAG+LAM P+VLVLDEPTAGLDP+GRKE+M +F +LHQ G 



+T +LVTH M+D A YAD + V+ KG I SG P+ +F + + L +P+ K + 



G+ + +T+E+ 



Query: 


16 


Sb j ct : 


3 


Query: 


76 


Sb j ct : 


63 


Query: 


136 


Sb j ct : 


123 


Query: 


196 


Sbjct: 


183 


Query: 


255 


Sbjct: 


243 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 218/280 (77%) , Positives = 241/280 (85%) 

Query: 1 MGIEFKNVSYTYQAGTPFEGRALFDVNLKIEDASYTAFIGHTGSGKSTIMQLLNGLHIPT 60 

M I +NVSYTYQAGTPFEGRALF++NL I D SYTAFIGHTGSGKSTIMQLLNGLH+PT 
Sbjct: 1 MSINLQNVSYTYQAGTPFEGRALFNINLDILDGSYTAFIGHTGSGKSTIMQLLNGLHVPT 60 

Query: 61 KGEVIVDDFSIKAGDKNKEIKFIRQKVGIiVFQFPESQLFEETVLKDVAFGPQNFGISQIE 120 

G V VD I KNKEIK 1R+ VGLVFQFPESQLFEETVLKDVAFGPQNFG+S E 
Sbjct: 61 TGIVSVDKQDITNHSKNKEIKSIRKHVGLVFQFPESQLFEETVLKDVAFGPQNFGVSPEE 120 

Query: 121 AERIAEEKLRLVGISEDLFDKNPFELSGGQMRRVAIAGILAMEPKVLVLDEPTAGLDPKG 180 

AE LA EKL LVGISE+LF+KNPFELSGGQMRRVAIAGILAM+PKVLVLDEPTAGLDPKG 
Sbjct: 121 AEALAREKLALVGISENLFEKNPFELSGGQMRRVAIAGILAMQPKVLVLDEPTAGLDPKG 180 

Query: 181 RKELMTLFKNLHKKGMTIVLWHLMDDVADYADYVYVLEAGKVTLSGQPKQIFQEVELLE 240 

RKELMT+FK LH+ GMTI VLVTHLMDDVA+YAD+VYVL+ GK+ LSG+PK IFQ+V LLE 
Sbjct: 181 RKELMTIFKKLHQSGMTIVLVTHLMDDVANYADFvYVLDKGKIILSGKPKTIFQQVSLLE 240 

Query: 241 SKQLGVPKITKFAQRLSHKGLNLPSLPITINEFVEAIKHG 280 

KQLGVPK+TK AQRL +G+ + SLPIT+ E E +KHG 
Sbjct: 241 KKQLGVPKVTKLAQRLVDRGIPISSLPITLEELREVLKHG 280 

SEQ ID 3170 (GBS401) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 76 (lane 3; MW 34.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 4; MW 59kDa). 

GBS401-GST was purified as shown in Figure 218, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1029 

A DNA sequence (GBSxl099) was identified in S.agalactiae <SEQ ID 3173> which encodes the amino 
acid sequence <SEQ ID 3174>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>>> Seems to have no N- terminal signal sequence 
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-0 


.43 


Transmembrane 


199 - 


215 


( 


199 
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A related GBS nucleic acid sequence <SEQ ID 8707> which encodes amino acid sequence <SEQ ID 8708> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
SRCFLG: 0 

McG: Length of OR: 8 

Peak Value of TJR: 0.65 

Net Charge of CR: 1 
McG: Discrim Score: -10.55 
GvH: Signal Score (-7.5): 1.45 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 . 5182 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Possible site: 37 
>>> Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 



)M program 


count: 6 value: 


-10. 


.46 threshold: 


0.0 
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modified ALOM score: 2.59 
icml HYPID: 7 CFP: 0.518 



*** Reasoning Step: 3 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -- 



Certainty=0. 5182 (Affirmative) < suco 
Certainty=0 . 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB11923 GB.-Z99104 ybaF [Bacillus subtilis] 
Identities = 133/263 (50%) , Positives = 191/263 (72%) 

Query: 7 MDKLILGRYIPGNSLIHKLDPRSKLLAMLLFI I IVFWANNWTNVI VFI FTLVIVGLSQI 66 

MD +I+G+Y+PG SL+H+LDPR+KL+ + LF+ IVF ANNV T ++ +FT+ +V L+++ 
Sbjct: 2 MDSMIIGKYVPGTSLVHRLDPRTKLITIFLFVCIVFLANNVQTYALLGLFTIGWSLTRV 61 

Query: 67 KFSYFFNGIKPMVGIILFTTLFQMLFAQGGQVIFSFWIFSITSLGLQQAALIFMRFVLII 126 

FS+ G+KP++ I+LFT L +L G +IF + GL Q I +RFV +1 

Sbjct: 62 PFSFLMKGLKPIIWIVLFTFLnHILMTHEGPIIFQIGFSRVYEGGLVQGIFISLRFVYLI 121 

Query: 127 FFSTLLTLTTTPLSLADAVESLLKPLErraRVPAHEIGLMLSLSLRFVPTLMDDTTRIMNA 186 

+TLLTLTTTP+ + D +E LL PL+ L++P HE+ LM+S+SLRF+PTLM++T +IM A 
Sbjct: 122 LITTLLTLTTTPIEITDGMEQLLNPLKKLKLPVHELALMMSISLRFIPTLMEETDKIMKA 181 

Query: 187 QRARGVDFGEGNLIHKVKSIIPILIPLFASSFKRADALAIAMEARGYQGGANRSKYRLLK 246 

Q ARGVDF G + +VK+I+P+L+PLF S+FKRA+ LA+AMEARGYQGG R+KYR L 
Sbjct: 182 QMARGVDFTSGPVKERVKAIVPLLVPLFVSAFKRAEELAVAMEARGYQGGEGRTKYRKLV 241 

Query: 247 WTVRDTFS ILLMLLLGLSLFLLK 269 

WT +DT 1+ +++L LF L+ 
Sbjct: 242 WTGKDTSVIVSLIVLAALLFSLR 264 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3175> which encodes the amino acid 
sequence <SEQ ID 3176>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
>>> Seems to have no N-terminal signal sequence 
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62 


- 78 


( 


62 


- 78) 


INTEGRAL 


Likelihood 




-0. 


.27 


Transmembrane 


193 


- 209 


( 


193 


- 209) 



Final Results 

bacterial membrane -- 
bacterial outside -- 
bacterial cytoplasm -• 



- Certainty=0. 4800 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 



>GP:CAB11923 GB:Z99104 ybaF [Bacillus subtilis] 
Identities = 138/263 (52%) , Positives = 195/263 (73%) 
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Query: 1 MDKLILGRYIPGDSLIHRLDPRSKLLAMIIYIVIIFWANNVVTNLLMLTFTLAVVFLSKI 60 

MD +I+G+Y+PG SL+HRLDPR+KL+ + +++ I+F ANNV T L+ FT+ W L+++ 
Sbjct: 2 MDSMIIGKWPGTSLVHRLDPRTKLITIFLWCIVFLANNVQTYALLGLFTIGVVSLTRV 61 

5 

Query: 61 KLSFFLNGVKPMIGIILFTTLFQMFFSQGGKVIFSWWFISITDLGLSQAILIFMRFVLII 120 

SF + G+KP+I I+LFT L + + G +IF F + + GL Q I I +RFV +1 
Sbjct: 62 PFSFLMKGLKPIIWIVLFTFLLHILMTHEGPIIFQIGFSRVYEGGLVQGIFISLRFVYLI 121 

10 Query: 121 FFSTLLTLTTTPLSLSDAVESLLKPLTRFKVPAHEIGLMLSLSLRFVPTLMDDTTRIMNA 180 

+TLLTLTTTP+ ++D +E LL PL + K+P HE+ LM+S+SLRF+PTLM++T +IM A 
Sbjct: 122 LITTLLTLTTTPIEITDGMEQLLNPLKKLKLPVHELALMMSISLRFIPTLMEETDKIMKA 181 

Query: 181 QRARGVDFGEGNLIQKVKSIIPILIPLFASSFKRADAIiAIAMEARGYQGGEGRTKYRQLD 240 
15 Q ARGVDF G + ++VK+I+P+L+PLF S+FKRA+ LA+AMEARGYQGGEGRTKYR+L 

Sbjct: 182 QMARGVDFTSGPVKERVKAIVPLLVPLFVSAFKRAEEIAVAMEARGYQGGEGRTKYRKLV 241 

Query: 241 WQLKDSLAIGIVSLLGLLLFFLK 263 
W KD+ I + +L LLF L+ 
20 Sbjct: 242 WTGKDTSVIVSLIVLAALLFSLR 264 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 210/263 (79%) , Positives = 237/263 (89%) 

25 Query: 7 MDKLILGRYI PGNSLIHKLDPRSKLLAMLLFI 1 IVFWANNWTNVI VFIFTLVIVGLSQI 66 

MDKLILGRYIPG+SLIH+LDPRSKIirAM+++I+I+FWANNVVTN+++ FTL +V LS+I 
Sbjct: 1 MDKLILGRYIPGDSLIHRLDPRSKLLAMIIYIVIIFWAMNVVTmjLMLTFTIAWFLSKI 60 

Query: 67 KFSYFFNGIKPMVGIILFTTLFQMLFAQGGQVIFSFWIFSITSLGLQQAALIFMRFVLII 126 
30 K S+F NG+KPM+GIILFTTLFQM F+QGG+VIFS+W SIT LGL OA LIFMRFVLII 

Sbjct: 61 KLSFFIjNGVKPMIGIILFTTLFQMFFSQGGKVIFSWWFISITDLGLSQAILIFMRFVLII 120 

Query: 127 FFSTLLTLTTTPLSLADAvESLLKPLEVLRvPAHEIGLMLSLSLRFVPTLMDDTTRIMNA 186 
FFSTLLTLTTTPLSL+DAVESLLKPL +VPAHEIGIMIjSLSLRFVPTLMDDTTRIMNA 
35 ■ Sbjct: 121 FFSTLLTLTTTPLSLSDAVESLLKPLTRFKVPAHEIGLMLSLSLRFVPTLMDDTTRIMNA 180 

Query: 187 QRARGVDFGEGNLIHKVKSIIPILIPLFASSFKRADALAIAMEARGYQGGANRSKYRLLK 246 

QRARGVDFGEGNLI KVKSIIPILIPLFASSFKRADALAIAMEARGYQGG R+KYR L 
Sbjct: 181 QRARGVDFGEGNLIQKVKS 1 1 P ILI PLFAS S FKRADALAI AMEARGYQGGEGRTKYRQLD 240 



40 



Query: 247 WTVRDTFSILLMLLLGLSLFLLK 269 

W ++D+ +1 ++ LLGL LF LK 
Sbjct: 241 WQLKDSLAIGIVSLLGLLLFFLK 263 



45 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1030 

A DNA sequence (GBSxllOl) was identified in S.agalactiae <SEQ ID 3179> which encodes the amino 
acid sequence <SEQ ID 3180>. This protein is predicted to be unnamed protein product. Analysis of this 
50 protein sequence reveals the following: 

Possible site: 45 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-12.05 Transmembrane 22 - 38 ( 16 - 43) 

55 Final Results 

bacterial membrane Certainty=0. 5819 (Affirmative) < succ> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3181> which encodes the amino acid 
sequence <SEQ ID 3182>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

5 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 116/233 (49%) , Positives = 140/233 (59%) , Gaps = 39/233 (16%) 

15 Query: 9 KIJWKKHHLAYGAITLVALFSCIIAV^ 61 

K N+K+ + +G LVAL ILA++ F S T+S +K + ++ K 
Sbjct: 4 KENLKQRYFNFG L VALALTILAI I FAFSSKNADTKSYAKKSESKMVTIDKAPKNNHA 60 

Query: 62 MTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAVVTENTP 121 
20 +TK SK K + + P P+ ++ AP T +EE V Q VT 

Sbjct: 61 ITKEESKEKAKSIASEPIPTVENSVAP TVTEEVPWQQEVT 101 

Query: 122 ATSQAQQAYAVTETTYRPAQHQTSGQVLSNGNTAGAIGSAAAAQMAAATGVPQSTWEHII 181 
Q V+ Y P + VLSNGNTAG +GS AAAQMAAATGVPQSTWEHII 

25 Sbjct: 102 QTVQQVSSVAYNP NNWLSNGNTAGIVGSQAAAQMAAATGVPQSTWEHI I 151 

Query: 182 ARESNGNPNVANASGASGLFQTMPGWGSTATVQDQVNSAIKAYRAQGLSAWGY 234 

ARESNGNPN ANASGASGLFQTMPGWGSTATV+DQVN+A+KAY AQGLSAWGY 
Sbjct: 152 ARESNGNPNAANASGASGLFQTMPGWGSTATvEDQVNAALKAYSAQGLSAWGY 204 

30 

A related GBS gene <SEQ ID 8713> and protein <SEQ ID 8714> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 2.48 
35 GvH: Signal Score (-7.5): -3.74 

Possible site: 45 
>>> Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -12.05 threshold: 0.0 

INTEGRAL Likelihood =-12.05 Transmembrane 22 - 38 ( 16 - 43) 
40 PERIPHERAL Likelihood = 4.29 156 

modified ALOM score: 2.91 

*** Reasoning Step: 3 

45 Final Results 

bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the databases: 

61.8/68.7% over 114aa 

GP | 7959131 | secretory protein SAI-B Insert characterized 



Staphylococcus aureus 



55 ORF01057(664 - 1002 of 1302) 

GP|795913l|dbj |BAA95959.l| | AB042839 (119 - 233 of 233) secretory protein SAI-B 
{Staphylococcus aureus} 
%Match =15.1 

%Identity =61.7 %Similarity = 68.7 
60 Matches = 71 Mismatches = 34 Conservative Sub.s = 8 



438 468 498 528 558 588 618 648 
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IFKSSQVTTESLSKOTKWVAKKSKMTKATSKSKVEDVKQAPKPSQASNEAPKSSSQSTEANSQQQVTASEEAAVEQAW 

VDQAHLVDLAHNHQDQLNAAPIKDGAYDIHFVKre 

50 60 70 80 90 100 110 

5 

678 708 735 762 792 822 852 882 

TEOTPATSQAQQAYAVTETTYRP-AQHQTSGQV-LSITOOTAGaiGSAAAAQMAAATGVPQSTWEHIIARESNGNPNVANA 

: : I I lh II = II I lllllllll 11 = 11 II Mil III 11111111 I I 

SVSyNAQSSNSNVEAVSAPTYHOTSTSTTSSSTOLSNGNTAGATGSSAAQIMAQRTGVPASTWAAIIARESNGQVNAYNP 

10 130 140 150 160 170 180 190 

912 942 972 1002 1032 1062 1092 1122 

SGASGLFQTMPGWGSTAOTQDQVNSAIKAYRAQGLSAWGY**IAIN*L^ 

llllllllllllll I II l=|:|:||l=llll 111= 
1 5 SGASGLFQTMPGWGPTNTVDQQINAAVKAYKAQGLGAWGF 

SEQ ID 3180 (GBS25) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 5; MW 25kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 11; MW 50kDa), Figure 63 
20 (lane 6; MW 50.3kDa), Figure 66 (lane 6; MW 50kDa) and in Figure 175 (lane 8 & 9; MW 50kDa). 

Purified GBS25-GST is shown in Figure 9A, Figure 193 (lane 11) and Figure 210 (lane 5). 

The purified GBS25-GST fusion product was used to immunise mice (lane 1+2+3 products; 20ug/mouse). 
The resulting antiserum was used for Western blot (Figure 95B), FACS (Figure 95C ), and in the in vivo 
passive protection assay (Table III). These tests confirm that the protein is immunoaccessible on GBS 
25 bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1031 

A DNA sequence (GBSxll03) was identified in S.agalactiae <SEQ ID 3183> which encodes the amino 
30 acid sequence <SEQ ID 3184>. This protein is predicted to be L-serine dehydratase 1 (sdaA-2). Analysis of 
this protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.85 Transmembrane 205 - 221 ( 205 - 221) 
35 INTEGRAL Likelihood = -0.59 Transmembrane 171 - 187 ( 171 - 187) 

INTEGRAL Likelihood = -0.53 Transmembrane 226 - 242 ( 226 - 242) 

Final Results 

bacterial membrane Certainty=0. 1341 (Affirmative) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13459 GB:Z99112 similar to L-serine dehydratase [Bacillus subtilis] 
45 Identities = 176/289 (60%) , Positives = 224/289 (76%) , Gaps = 1/289 (0%) 

Query: 1 MFYTIEELVEQANSQHKGNIAELMIQTEIEMTGRSREEIRYIMSRNLEVMKASVIDGLTP 60 

MF ++EL+E + + I+++MI E+E+T +++E+I M NL VM+A+V GL 
Sbjct: 1 MFRNVKELIE-ITKEKQILISDVMIAQEMEVTEKTKEDIFQQMDHNLSVMEAAVQKGLEG 59 



50 



Query: 61 SKS I SGLTGGDAVKMDQYLQSGKTI SDTT I LAAVRNAMAVNELNAKMGLVCATPTAGSAG 120 

S +GLTGGDAVK+ Y++SGK++S IL AV A+A NE+NA MG +CATPTAGSAG 
Sbjct: 60 VTSQTGLTGGDAVKLQAYIRSGKSLSGPLILDAVSKAVATNEVNAAMGTICATPTAGSAG 119 
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Query: 121 CLPAVI STAIEKIiNLTEEEQLDFIjFTAGAFGLVIGNNAS I SGAEGGCQAE VGSASAMA&A 180 

+P + EKLN T E+ + FLFTAGAFG V+ NNASISGA GGCQAEVGSAS MAAA 
Sbjct: 120 WPGTLFAVKEKI^PTREQMIRFLFTAGAFGFVVftNNASISGAAGGCQAEVGSASGMaAA 179 

5 Query: 181 ALVI^GGTPFQASQAIAFVIKNMLGLICSPmGIiVEVPCVTaOTALGSSFALVAADMALa 240 

A+V AGGTP Q++4A+A +KM^LGL+C»PVAGLVEVPCTKRNA+G+S A++AADMALA 
Sbjct: 180 AIVEMAGGTPEQSAEAI^ITLKI^Ii3LVCDP\^GLVEVPCTKWAMGASNAMIAADMAl^ 239 

Query: 241 GIESQIPVDEVIDAMYQVGSSLPTAFRETAEGGiLAATPTGRRYSKEIFG 289 
10 GI S+IP DEVIDAMY++G ++PTA RET +GGLAATPTGR K+IFG 

Sbjct: 240 GITSRIPCDEVIDAMyKIGQTMPTALRETGQGGIAATPTGRELEKKIFG 288 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3185> which encodes the amino acid 
sequence <SEQ ID 3186>. Analysis of this protein sequence reveals the following: 

15 Possible site: 55 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.12 Transmembrane 196 - 212 ( 196 - 213) 
INTEGRAL Likelihood = -0.27 Transmembrane 226 - 242 ( 226 - 242) 

20 Final Results 

bacterial membrane Certainty=0 . 1447 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) .< suco 

25 The protein has homology with the following sequences in the databases: 

>GP:CAB13459 GB:Z99112 similar to L-serine dehydratase [Bacillus subtilis] 
Identities = 173/289 (59%) , Positives = 222/289 (75%) , Gaps = 1/289 (0%) 

Query: 1 MFYTIEELVKQADQQFNGNIAELMIATEVEMSGRNREDIIKIMSRNLQVMKAAVTEGLTS 60 
30 MF ++EL++ ++ I+++MIA E+E++ + +EDI + M NL VM+AAV +GL 

Sbjct: 1 MFPJ^VKELIEITKEK-QILISDVMIAQEMEVTEKTKEDIFQQMDHNLSVMEAAVQKGLEG 59 

Query: 61 TKS I SGLTGGDAVKMDNYI KKGNSLSDTTILNAVKNAIAVNELNAKMGLVCATPTAGSAG 120 
S +GLTGGDAVK+ YI+ G SLS IL+AV A+A NE+NA MG + CATPTAGSAG 
35 Sbjct: 60 VTSQTGLTGGDAVKLQAYIRSGKSLSGPLILDAVSKAVATNEVNAftMGTI CATPTAGSAG 119 

Query: 121 CLPAVLATAIEKLDLSEKEQLEFLFTAGAFGLVIGNNASISGAEGGCQAEVGSAAAMSAR. 180 

+P L EKL+ + ++ + FLFTAGAFG V+ NNASISGA GGCQAEVGSA+ M+AA 
Sbjct: 120 WPGTLFAVKEKLNPTREQMIRFLFTAGAFGFWANNASISGAAGGCQAEVGSASGMAAa 179 

40 

Query: 181 ALVKAAGGTSHQASQAIAFVIKNLLGLVCDPVAGLVEVPCVKRNALGASFALVAADMALA 240 

A+V+ AGGT Q+++A+A +KN+LGLVCDPVAGLVEVPCVKRNA+GAS A++AADMALA 
Sbjct: 180 AIVEMAGGTPEQSAFAMAITLKNMLGLVCDPVAGLVEVPCTKRNAMGASNAMIAADMALA 239 

45 Query: 241 DIDSQIPVDEVIDAMYQVGSAMPTAFRETAEGGLAATPTGRRYSVEIFG 289 

I S+IP DEVIDAMY++G MPTA RET +GGLAATPTGR +IFG 
Sbjct: 240 GITSRIPCDEVIDAMYKIGQTMPTALRETGQGGLAATPTGRELEKKIFG 288 

An alignment of the GAS and GBS proteins is shown below. 

50 Identities = 244/290 (84%) , Positives = 273/290 (94%) 

Query: 1 MFYTIEELVEQANSQHKGNIAELMIQTEIEMTGRSREEIRYIMSRNLEVMKASVIDGLTP 60 

MFYTIEELV+QA+ Q GNIAELMI TE+EM+GR+RE+I IMSRNL+VMKA+V +GLT 
Sbjct: 1 MFYTIEELVKQADC5FNGNIAELMIATEVEMSGRNREDIIKIMSRNLQVMKAA.VTEGLTS 60 



55 



Query: 61 SKSISGLTGGDAVKMDQYLQSGKTISDTTIIAAVRNAMAVNELNAKMGLVCATPTAGSAG 120 

+KSISGLTGGDAVKMD Y++ G ++SDTTIL AVRNA+AVNELNAKMGLVCATPTAGSAG 
Sbjct: 61 TKS I SGLTGGDAvTSMDNYI KKGNSLSDTTILNAVRNAI AVISIEIiNAKMGLV 120 



60 Query: 121 CLPAVISTAIEKL^TEEEQLDFLFTAGAFGLVIGNNASISGAEGGCQAEVGSASAMAAA 180 

CLPAV++TAIEKL+L+E+EQL+FLFTAGAFGLVIGNNASISGAEGGCQAEVGSA+AM+AA 
Sbjct: 121 CLPAVLATAIEKLDLSEKEQLEFLFTAGAFGLVIGNNASISGAEGGCQAEVGSAAAMSAA 180 



Query: 181 ALVMA&GGTPFQASQAIAFVIKNMLGLICDPVAGLVEVPCT 240 
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ALV AAGGT QASQAIAWIKN+LGL+CDPVAGLVEVPCVKRNALG+SFALVAADMAIA 
Sbjct: 181 ALVKAAGGTSHQASQAIAFVIK!aLGr.VCDPVAGLVEVPCVKRNALGASFALVAADMALA 240 

Query: 241 GIESQIPVDEVIDAMYQVGSSLPTAFRETAEGGLAATPTGRRYSKEIFGE 290 
5 I+SQIPVDEVIDAMYQVGS++PTAFRETAEGGLAATPTGRRYS EIFGE 

Sbjct: 241 DIDSQIPVDEVIDAMYQVGSAMPTAFRETAEGGLAATPTGRRYSVEIFGE 290 

SEQ ID 3184 (GBS358) was expressed in E.coli as a His-rusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 176 (lane 6; MW 35kDa). 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1032 

A DNA sequence (GBSxll04) was identified in S.agalactiae <SEQ ID 3187> which encodes the amino 
acid sequence <SEQ ID 3188>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

»> Seems to have a cleavable N-term signal seg. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06216 GB:AP001515 Ii-serine dehydratase beta subunit [Bacillus halodurans] 
25 Identities = 101/216 (46%) , Positives = 156/216 (71%) , Gaps = 2/216 (0%) 



30 



Query: 4 LKFQSVFDIIGPVMIGPSSSHTAGAVRIGKWHSIFGE-PSEVTFHLYNSFAKTYQGHGT 62 

+K+++VFDIIGPVMIGPSSSHTAGA RIG+V ++FG+ P + Y SFA+TY+GHGT 

Sbjct: 1 MKYRTVFDIIGPVMIGPSSSHTAGAARIGRVARTLFGQQPERCDIYFYGSFAETYKGHGT 60 

Query: 63 DKALVAGILGMDTDNPDIKNSLEIAHQKGIKIYWDILKDSNSPHPNTAKITVKNGDRSMS 122 

D A+V GIL DT +P I SL++A +KG+++Y+ +++ + HPNTAK+ ++ G+ + 
Sbjct: 61 DVAIVGGILDFDTFDPRIPRSLQLAKEKGVRVYFHE-EEAITDHPNTAKVVLQKGEDQLE 119 

35 Query: 123 ITGVSIGGGNIQVTEI.NGFSVSLTMOTPTLIIVHQDIPGMIAKVTDILSDFNINIAQMNV 182 

+ GVSIGGG I++ ELNGF + L+ N P +++VH D G+IA V+++L+ INI M V 
Sbjct: 120 WGVSIGGGKIEIVELNGFHLKLSGNHPAILVVHTDRFGVIASVSNMLAKHEINIGHMEV 179 

Query: 183 TRESAGEKAIMI IEVDSRDCQQAVKKIEAI PHLHNV 218 
40 +R+ G++A+M+IEVD ++++E +P++ V 

Sbjct: 180 SRKEKGKEALMVIEVDQNVDDLLLQELERLPNIVTV 215 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3189> which encodes the amino acid 

sequence <SEQ ID 3190>. Analysis of this protein sequence reveals the following: 

45 Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 916 1> which encodes the amino acid sequence 
<SEQ ID 9162>. Analysis of this protein sequence reveals the following: 

55 Possible site: 28 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 



5 



bacterial outside Certainty= 0.300 (Affirmative) < suco 

bacterial membrane Certainty= 0.000 (Not Clear) < suco 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 



Identities = 187/223 (83%) , Positives = 205/223 (91%) , Gaps = 1/223 (0%) 



10 



Query: 1 MKHLKFQSVFDIIGPvMIGPSSSHTAGAVRIGKWHSIFGE-PSEVTFHLYNSFAKTYQG 59 

M KFQSVFDIIGPVMIGPSSSHTAGAVRIGKWHSIFG+ P EVTFHLYNSFAKTY+G 
Sbjct: 3 MNTQKFQSVFDIIGPvMIGPSSSHTAGAWIGKWHSIFGDIPDEVTFHLYNSFAKTYRG 62 



15 



Query: 60 HGTDKALVAGILGMDTDNPD I KNSLE I AHQKGIKI YWDI LKDSNS PHPNTAKITVKNGDR 119 

HGTDKALVAGI+GM TDNPDIKNSLEIAHQKGIKIYWDILKDSN+PHPNT KI+VK D+ 
Sbjct: 63 HGTDKALVAGIMGMGTDNPDIKNSLEIAHQKGIKIYWDILKDSNAPHPNTVKISVKKADK 122 



20 



Query: 120 SMSITGVSIGGGNIQVTELNGFSVSLTMNTPTLIIVHQDIPGMIAKVTDILSDFNINIAQ 179 

++S+TGVSIGGGNIQVTELNGFSVSL+MNTPT++ VH+DI PGMIAKVTDILS NINIA 
Sbjct: 123 TLSOTGVSIGGGNIQVTELNGFSVSLSMNTPTIvTVHKDIPGMIAKVTDILSSNNINIAT 182 



Query: 180 MNVTRESAGEKAIMIIEVDSRDCQQAVKKIEAIPHLHNVNFFD 222 

MNVTRESAGEKA MIIEVDSR+CQ+A +1 I PH+ +NVNFFD 
Sbjct: 183 MNVTRESAGEKATMIIEVDSRECQEAANQIAKIPHIYNVNFFD 225 



25 



SEQ ID 3188 (GBS151) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 31 (lane 3; MW 50kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 188 (lane 11; MW 25kDa) and in Figure 165 
(lane 14-16; MW25.3kDa). 



is immunoaccessible on GBS bacteria. 

GBS151L was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 127 (lane 8-10; MW 50kDa). GBS151L was also expressed in E.coli as a His-fusion 
35 product. SDS-PAGE analysis of total cell extract is shown in Figure 127 (lane 11 & 12; MW 25kDa), in 
Figure 128 (lane 7; MW 25kDa) and in Figure 180 (lane 7; MW 25kDa). Purified GBS151L-His is shown 
in Figure 232 (lanes 5 & 6) and in Figure 240 (lanes 3 & 4). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 1033 

A DNA sequence (GBSxll05) was identified in S.agalactiae <SEQ ID 3191> which encodes the amino 
acid sequence <SEQ ID 3192>. This protein is predicted to be tRNA (5-methylaminomethyl-2- 
thiouridylate)-methyltransferase (trmU). Analysis of this protein sequence reveals the following: 

Possible site: 47 
45 >>> Seems to have no N-terminal signal sequence 



30 The GBS151-GST fusion product was purified (Figure 198, lane 3; Figure 236, lane 8) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 289), which confirmed that the protein 



Final Results 



50 



bacterial cytoplasm Certainty=0. 2208 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10291> which encodes amino acid sequence <SEQ ID 
10292> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04980 GB:AP001511 
5 (5-methylaminomethyl-2-thiouridylate) -methyltran sferase 

[Bacillus halodurans] 
Identities = 250/359 (69%) , Positives = 292/359 (80%) , Gaps = 6/359 (1%) 



10 


Query: 


32 


RVWGMSGGVBSSVTALIjLKEOJ3YDVIGVFMKNWDDTDEFGVCTATEDYKDVAAVADQIG 


91 






RWVGMSGGVDSSVTALLLKEQGYDVIG+FMKNWDDTDE GVCTATEDY+DV V +Q+G 






Sbjct: 


10 


RVWGMSGGVDSSVTALLLKECGYDVIGIFMKNWDDTDENGVCTATEDYQDWQVCNQLG 


69 




Query: 


92 


I PYYS VNFEKE YWDRVFE YFIAEYRAGRTPNPDVMCNKE IKFKAFLDYAMTLGADYVATG 


151 








I YY+VNFEKEYWD+VF YFL EY+AGRTPNPDVMCNKEI KFKAFL+ +A+TLGADYVATG 




15 


Sbjct: 


70 


IAYYAVNFEKE YWDKVFTYFLEE YKAGRTPNPDVMCNKE I KFKAFLNHALTLGADYVATG 


129 



Query: 152 HYAQVTRDENGIVHMLRGADNNKDQTYFLSQLSQEQLQKTLFPLGHLQKPEVRRIAEEAG 211 

HYAQV ++ +G ++RG D NKDQTYFL+ LSQ+QL + +FPLGHL+K EVR IAE AG 
Sbjct: 130 HYAQV-KNVDGQYQLIRGKDPNKDQTYFI^ALSQQQLSRVMFPLGHLEKKEVRAIAERAG 188 

20 

Query: 212 LATAKKKDSTGICFIGEKNFKDFLGQYLPAQPGRMMTVDGRDMGEHAGLMYYTIGQRGGL 271 

IATAKKKDSTGICFIG+++FK+FL YLPAQPG M T+DG G H GLMYYT+GQR GL 
Sbjct: 189 LATAKKKDSTGICFIGKRDFKEFLSSYLPAQPGEMQTLDGEWGTHDGLMYYTLGQRQGL 248 

25 Query: 272 GIGGQHGGDNKPWFWGKDLSKNILYVGQGFYHDSLMSTSLTASEIHFTRDMPNEFKLEC 331 
GI GG +PWFV+GK+L KNI LYVGQGF+H L S LA ++++ ++ EC 

Sbjct: 249 GI GGSGEPWFVIGKNLEKNILWGQGFHHPGLYSEGLRAIKVNWILRRESDEPFEC 304 

Query: 332 TAKFRYRQPDSKVTVWKGNQA-RWFDDLQRAITPGQAVVFYNEQE^ 389 
30 TAKFRYRQPD KVTVY + + A V+F + QRAITPGQAWFY+ CLGGG ID + 

Sbjct: 305 TAKFRYRQPDQKVTVYPQSIXaWVIjFAEPQRAITPGQAvVEYIXSDVCLGGGTIDHVLK 363 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3193> which encodes the amino acid 
sequence <SEQ ID 3194>. Analysis of this protein sequence reveals the following: 

35 Possible site: 29 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1691 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 331-333 



45 The protein has homology with the following sequences in the databases: 

>GP:BAB04980 GB:AP001511 

(5-methylaminomethyl-2-thiouridylate) -methyltran sferase 
[Bacillus halodurans] 
Identities = 255/359 (71%) , Positives = 293/359 (81%) , Gaps = 6/359 (1%) 

50 



Query: 


14 


RVWGMSGGvnSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDYKDVAAVADKIG 


73 






RVWGMSGGVDSSVTALLLKEQGYDVIG+FMKNWDDTDE GVCTATEDY+DV V +++G 




Sb j ct : 


10 


RVWGMSGGVDSSVTALLLKEQGYDVIGIFMKNWDDTDENGVCTATEDYQDWQVCNQLG 


69 


Query: 


74 


IPYYSVNFEKEYWDRVFEYFLAEYRaGRTPNPDVMCNKEIKFKAFLDYAMTLGADYVATG 


133 






I YY+VNFEKEYWD+VF YFL EY+AGRTPNPDVMCNKEIKFKAFL++A+TLGADYVATG 




Sb j ct : 


70 


IAYYAWFEKEYVTOKVFTYFLEEYKAGRTPNPDVMCNKEIKFKAFLNHALTLGADYVATG 


129 



Query: 134 HYAQVKRDENGTvHMLRGADNGKDQTYFLSQLSQEQLQKTLFPLGHLQKSEVREIAERAG 193 
60 HYAQVK + +G ++RG D KDQTYFL+ LSQ+QL + +FPLGHL+K EVR IAERAG 

Sbjct: 130 HYAQVK-NVDGQYQLIRGKDPNKDQTYFLNALSCjQQLSRVMFPLGHLEKKEVRAIAERAG 188 
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Query: 194 IATAKKKDSTGICFIGEKWFKQFLSQYLPAQKGRMMTIDGRDMGEHAGLMYYTIGQRGGL 253 

LATAKKKDSTGICFIG+++FK+FLS YLPAQ G M T+DG G H GLMYYT+GQR GL 
Sbjct: 189 IATAKKKDSTGICFIGKRDFKEFLSSYLPAQPGEMQTLDGEVKGTHDGLMYYTLGQRQGL 248 

5 Query: 254 GIGGQHGGDNQPWFWGKDLSQWILYVGQGFYHEALMSNSLDASVIHFTREMPEEFTFEC 313 
GI GG +PWFV+GK+L +NILYVGQGF+H L S LA +++ + FEC 

Sbjct: 249 GI GGSGEPWFVIGKNLEIQIILYVGQGFHHPGLYSEGLRAIKVNWILRRESDEPFEC 304 

Query: 314 TAKFRYRQPDSHVAVHVRGDKA-EWFAEPQRAITPGQAWFYDGKECLGGGMIDMAYK 371 
10 TAKFRYRQPD V V+ + D A EV+FAEPQRAITPGQAWFYDG CLGGG ID K 

Sbjct: 305 TAKFRYRQPDQKVTVYPQSDGAVEVLFAEPQRAITPGQAWFYDGDVCLGGGTIDHVLK 363 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 332/377 (88%) , Positives = 349/377 (92%) 

Query: 21 GRILMTDNSNIRVWGMSGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 80 

G MTDNS IRWVGMSGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 
Sbjct: 3 GEFFMTDNSKIRVWGMSGGVDSSVTALLLKEQGYDVIGVFMKNWDDTDEFGVCTATEDY 62 

20 Query: 81 KDVAAVADQIGIPYYSVNFEKEYWDRVFEYFLAEYRAGRTPNPDVMCNKEIKFKAFLDYA 140 

KDVAAVAD+IGIPYYSVNFEKEYWDRVFEYFIiAEYRAGRTPNPDVMCNKEIKFKAFLDYA 
Sbjct: 63 KDVAAVADKIGIPYYSVNFEKEYWDRVFEYFLAEYRAGRTPNPDVMCNKEIKFKAFLDYA 122 

Query: 141 MTLGADYVATGHYAQVTRDENGIVHMLRGADNNKDQTYFLSQLSQEQLQKTLFPLGHLQK 200 
25 MTLGADYVATGHYAQV RDENG VHMLRGADN KDQTYFLSQLSQEQLQKTLFPLGHLQK 

Sbjct: 123 MTLGADYVATGHYAQVKRDENGTVHMLRGADNGKDQTYFLSQLSQEQLQKTLFPLGHLQK 182 

Query: 201 PEVRRIAEEAGLATAKKKDSTGICFIGEKNFKDFLGQYLPAQPGRMMTVDGRDMGEHAGL 260 
EVR IAE AGLATAKKKDSTGICFIGEKNFK FL QYLPAQ GRMMT+DGRDMGEHAGL 
30 Sbjct: 183 SETOEIAERAGLATAKKKDSTGICFIGEKNFKQFIiSQYLPAQKGRMMTIDGRDMGEHAGL 242 

Query: 261 MYYTIGQRGGLGIGGQHGGDNKPWFVVGKDLSKNILYVGO^FYHDSLMSTSLTASEIHFT 320 

MYYTIGQRGGLGIGGQHGGDN+PWFWGKDLS+NILYVGQGFYH++LMS SL AS IHFT 
Sbjct: 243 MYYTIGQRGGLGIGGQHGGDNQPWFVVGKDLSQNILYVGQGFYHEALMSNSLDASVIHFT 302 

35 

Query: 321 RDMPNEFKLECTAKFRYRQPDSKOTVYVKGNQARWFDDLQRAITPGQAWFYNEQECLG 380 

R+MP EF ECTAKFRYRQPDS V V+V+G++A WF + QRAITPGQAWFY+ +ECLG 
Sbjct: 303 REMPEEFTFECTAKFRYRQPDSHVAVHVRGDKAEWFAEPQRAITPGQAWFYDGKECLG 362 

40 Query: 381 GGMIDQAYRDDKICQYI 397 

GGMID AY++ + CQYI 
Sbjct: 363 GGMIDMAYKNGQPCQYI 379 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
45 vaccines or diagnostics. 

Example 1034 

A DNA sequence (GBSxll06) was identified in S.agalactiae <SEQ ID 3195> which encodes the amino 
acid sequence <SEQ ID 3196>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
50 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-12.84 Transmembrane 141 - 157 ( 134 - 165) 

INTEGRAL Likelihood =-11.78 Transmembrane 40 - 56 ( 36 - 73) 

INTEGRAL Likelihood = -4.35 Transmembrane 68 - 84 ( 65 - 86) 

INTEGRAL Likelihood = -3.50 Transmembrane 180 - 196 ( 175 - 199) 

55 



60 



Final Results 

bacterial membrane Certainty=0. 6137 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) <: suco 

The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB15390 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 71/202 (35%) , Positives = 120/202 (59%) , Gaps = 5/202 (2%) 

Query: 1 MISKFILAFMAFFAIMNPISNLEAFMALVADDDQKISRRIAAKGVLIiAFVIIVIFVLSGH 60 
5 MS + F++ FA+ NPI N+P F+ L + IA K +L+F 1+ F++ GH 

Sbjct: 2 MFSFI VHVF I SLFAVSNPIGNVP I FLTLTEGYTAAERKAIARKAAI LS FFI 1AAFLVFGH 61 

Query: 61 LLFNLFGITLRALKISGGILVGIIGYKMINGIHSPTNK-NLEEHKD--DPMOTAVSPLAM 117 
L+F LF I + AL+++GGI + I Y ++N S + +EHK+ + +++V+PL++ 

10 Sbjct: 62 LIFKLFDINIHALRVAGGIFIFGIAYNLLNAKESHVQSLHHDEHKESKEKADISVTPLSI 121 

Query: 118 PLIAGPGTIATAMGI1SSG--GLSGKLITIIAFAILCVIMYVILISANEITKFLGKNAMTI 175 

P++AGPGTIAT M LS+G G+ ++ A + + ++ +1+ LGK M + 

Sbjct: 122 PIIAGPGTIATVMSLSAGHSGIGHYAAVMIGIAAVIALTFLFFHYSAFISSKLGKTEMOT 181 

15 

Query: 176 ITKMMGLILMTIGIEMLITGIK 197 

IT++MGLIL + + M+ G+K 
Sbjct: 182 ITRLMGLILAWAVGMIGAGLK 203 

20 No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8715> and protein <SEQ ID 8716> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 9.79 
25 GvH: Signal Score (-7.5): -1.53 

Possible site: 29 
>>> Seems to have a cleavable N-term signal seq. 
ALOM program count: 4 value: -12.84 threshold: 0.0 

INTEGRAL Likelihood =-12.84 Transmembrane 141 - 157 ( 134 - 165) 
30 INTEGRAL Likelihood =-11.78 Transmembrane 40 - 56 ( 36 - 73) 

INTEGRAL Likelihood = -4.35 Transmembrane 68 - 84 ( 65 - 86) 
INTEGRAL Likelihood = -3.50 Transmembrane 180 - 196 ( 175 - 199) 
PERIPHERAL Likelihood = 1.27 110 



35 



55 



modified ALOM score: 3.07 
*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0 . 6137 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00620(301 - 891 of 1209) 
45 OMNI|NT01BS3953 (11 - 212 of 220) conserved hypothetical protein 

%Match =15.8 

%Identity =35.5 %Similarity =61.5 

Matches = 71 Mismatches = 74 Conservative Sub.s = 52 

50 96 126 156 186 216 246 276 306 

VQLSSDIVNLTVKLQFT*KVIKQGLCLMIYNEQSHQVKLLFFIMNK^ 



VQRLSTRRYMMF 
10 

336 366 396 426 456 486 516 546 

SKFILAFMAFFAIMNPISNLPAFMALVADDDQKISRRIAAKGVLLAFVIIVIFV^ 



SFIVHVFISLFAVSNPIGNVPIFLTLTEGYTAAERKAIARKAAILSFFILAAFLVFGHLIFKLFDINIHALRVAGGIFIF 
60 30 40 50 60 70 80 90 

576 603 627 657 687 711 741 771 

I IGYRMINGIHSPTNK-NLEEHKD- -DPMNVAVSPLAMPLLAGPGTIATAMGLSSG- -GLSGKLITILAFAILCVIMYVI 
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GIAYNLI^AKESHVQSLHHDEHKESKEKADISVTPLSIPimGPGTIATVMSLSAGHSGIGHYAAVMIGIAAVIALTFLF 
110 120 130 140 150 160 170 

801 831 861 891 921 951 981 1011 

5 LISAlffilTKFLGKNAMTIITKIWGLILMTIGIEMLITGIKlGF 

= = |: III I = = 1= 1=1 

FHySAFISSKLGKTEMNVITRLMGLILAWAVGMIGAGLKGMFPVLTS 
190 200 210 220 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1035 

A DNA sequence (GBSxll07) was identified in S.agalactiae <SEQ ID 3197> which encodes the amino 
acid sequence <SEQ ID 3198>. Analysis of this protein sequence reveals the following: 

15 Possible site: 17 

»> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1747 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10289> which encodes amino acid sequence <SEQ ID 
10290> was also identified. 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC45494 GB:U80409 glucose inhibited division protein homolog 
GidA [Lactococcus lactis subsp. cremoris] 
Identities = 394/524 (75%), Positives = 458/524 (87%), Gaps = 2/524 (0%) 

30 Query: 13 KTLLATINLEMLAFMPCNPSIGGSAKGIVVREIDALGGEMGKNIDKTYIQMKMtNTGKGP 72 

KTLL TINL M+AFMPCNPSIGGSAKGIWREIDALGGEMG+NIDKTYIQMKMLNTGKGP 
Sbjct: 12 KTLLMTINIiNMVAFMPCNPSIGGSAKGIVVREIDALGGEMGRNIDKTYIQMKMLNTGKGP 71 

Query: 73 AVRALRAQADKALYAQTMKQTVEKQENLTLRQAMIDEILVEDGK--WGVRTATNQKFSA 130 
35 AVRALRAQADK YA +MK TV QENLTIiRQ M++E++++D K V+GVRT+T ++ A 

Sbjct: 72 AVRALRAQADKDEYAASMKNTVSDQENLTLRQGMVEELILDDEKQKVIGVRTSTGTQYGA 131 

Query: 131 KSWITTGTALRGEIILGDLKYSSGPNNSLASVTLADNLRDLGLEIGRFKTGTPPRVKAS 190 
K+V+ITTGTALRGEII+G+LKYSSGPNNSL+S+ IADNLR++G EIGRFKTGTPPRV AS 
40 Sbjct: 132 KAV1ITTGTALRGEI1IGELKYSSGPNNS1.SSIGLADNLREIGFEIGRFKTGTPPRVLAS 191 



45 



Query: 191 SINYEKTEIQPGDEQPNHFSFMSRDEDYITDQVPCWLTYTNTLSHDIINQNLHRAPMFSG 250 

SI+Y+KTEIQPGDE PNHFSFMS DEDY+ DQ+PCWI/TYT SH 1+ NLHRAP+FSG 
Sbjct: 192 SIDYDKTEIQPGDEAPNHFSFMSSDEDYDKDQIPCWLTYTTENSHTILRDNLHRAPLFSG 251 

Query: 251 IVKGVGPRYCPSIEDKI vRFADKERHQLFLEPEGRYTEEVYVQGLSTSLPEDVQVDLLRS 310 

IVKGVGPRYCPSIEDKI RFADK RHQIiFLEPEGR TEEVY+ GLSTS+PEDVQ DL++S 
Sbjct: 252 IVKGVGPRYCPSIEDKITRFADKPRHQLFLEPEGRNTEEVYIGGLSTSMPEDVQFDLVKS 311 

50 Query: 311 IKGLENAEMMRTGYAIEYDIVLPHQLRATLETKVIAGLFTAGQTNGTSGYEEAAGQGLVA 370 

I GLENA+MMR GYAIEYD+V+ PHQLR TLETK+I+GLFTAGQTNGTSGYEEAAGQGLVA 
Sbjct: 312 IPGLENAKMMRPGYAIEYDVVMPHQLRPTLETKLISGLFTAGQTNGTSGYEEAAGQGLVA 371 

Query: 371 GINAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDNADMR 430 
55 GINAALK+QGKPE ILKRS+AYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDNAD R 

Sbjct: 372 GINAALKIQGKPEFILKRSEAYIGVMIDDLVTKGTLEPYRLLTSRAEYRLILRHDNADRR 431 

Query: 431 LTEIGYEIGLVDEERYAIFKKRQMQFENELEKLDSIKIiKPVSETNKRIQELGFKPLTDAL 490 
LTEIG ++GLV + ++ ++ + QF+ E++RL+S KLKP+ +T +++ +LGF P+ DAL 
60 Sbjct: 432 LTEIGRQVGLVSDAQWEHYQAKMAQFDREMKRTaNSEKLKPLPDTQEKLGKLGFGPIKDAL 491 
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Query: 491 TAKEFMRRPQITYAVATDFVG(^EPLDSKVIELLETEIKYEGY 534 

T EF++RP++ Y DF+G A E +D V EL+ETEI YEGY 
Sbjct: 492 TGAEFLKRPEVNYDEVIDFIGQAPEVIDRTVSELIETEITYEGY 535 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3199> which encodes the amino acid 
sequence <SEQ ID 3200>. Analysis of this protein sequence reveals the following: 



Possible site: 28 

»> Seems to have no N-terrainal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 1064 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Mot Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 530/610 (86%) , Positives = 574/610 (93%) 

Query: 1 MFASIAASRMGCKTLIATINLEMIAFMPCNPSIGGSAKGIVTOEIDALGGEMGKNIDKTY 60 
20 +FASLA SRMGCKTLLATINL+MLAFMPCNPSIGGSAKGIWREIDALGGEMGKNIDKTY 

Sbjct: 21 VEASIATSRMGCKTLLATINLDMLAFMPCNPSIGGSAKGIVVREIDALGGEMGKNIDKTY 80 

Query: 61 1QMKMIOTGKGPAVRALRAQADKALYAQTMKQTVEKQENLTLRQAMIDEILVEDGKWGV 120 
IQMKMLNTGKGPAVRALRAQADK+LYA+ MK TVEKQ NLTLRQ MID+ILVEDG+WGV 
25 Sbjct: 81 IQMKMLNTGKGPAVRALRAQADKSLYAREMKHTVEKQANLTLRQTMIDDILVEDGRWGV 140 

Query: 121 RTATNQKFSAKSWITTGTALRGE1IIX3DLKYSSGPNNSIASVTIADNLRDLGLEIGRFK 180 

TAT QKF+AK+VV+TTGTALRGEIILG+LKYSSGPNNSLASVTLADNL+ LGLEIGRFK 
Sbjct: 141 LTATGQKFAAKA WVTTGTALRGEI ILGELKYSSGPNWSIiftSVTLaDNLKKLGLE IGRFK 200 

30 

Query: 181 TGTPPRVKASS1NYEKTEIQPGDEQPOTJFSFMSRDEDYITDQVPCT/LTYTNTLSHDIINQ 240 

TGTPPRVKASSINY++TEIQPGD++PNHFSFMS+D DY+ DQ+PCWLTYTN SHDIINQ 
Sbjct: 201 TGTPPRVKASSINYDQTEIQPGDDKPimFSFMSKDMJYLKDQIPCWLTYTNQTSHDIINQ 260 

35 Query: 241 NLHRAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGRYTEEVYVQGLSTSLP 300 

NL+RAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGR TEEVYVQGLSTSLP 
Sbjct: 261 NLYRAPMFSGIVKGVGPRYCPSIEDKIVRFADKERHQLFLEPEGRDTEEVYVQGLSTSLP 320 

Query: 301 EDVQVDLIiRSIKGLENAEMMRTGYAIEYDIVLPHQLRATLETKVIAGLFTAGQTNGTSGY 360 
40 EDVQ DL+ SIKGLE AEMMRTGYAIEYDIVLPHQLRATLETK+I+GLFTAGQTNGTSGY 

Sbjct: 321 EDVQKDLIHSIKGLEKAEMMRTGYAIEYDIVLPHQLRATLETKLISGLFTAGQTNGTSGY 380 

Query: 361 EEAAGQGLVAGINAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRL 420 
EEAAGQGL+AGINAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRL 
45 Sbjct: 381 EEAAGQGLIAGINAALKVQGKPELILKRSDAYIGVMIDDLVTKGTLEPYRLLTSRAEYRL 440 

Query: 421 ILRHDNADMRLTEIGYEIGLVDEERYAIFKICRQMQFENELERLDSIKLKPVSETNKRIQE 480 

ILRHDNADMRLTEIG +IGLVD+ER+ F+ ++ QF+NEL+RL+SIKLKP+ ETN R+Q+ 
Sbjct: 441 ILRHDNADMRLTEIGRDI GL VDDERWKAFE I KKNQFDKELKRLNS I KLKPI KETNDRVQD 500 

50 

Query: 481 LGFKPLTDALTAKEFMRRPQITYAVATDFVGCADEPLDSKVIELLETEIKYEGYIKKALD 540 

LGFKPLTDA+TAKEFMRRP+I YA A FVG A E LD+K+1ELLETEIKYEGYI+KALD 
Sbjct: 501 LGFKPLTDAMTAKEFMRRPEIDYATAVSFVGPAAEDLDAKI1ELLETEIKYEGYIRKALD 560 

55 Query: 541 QVAKMKRMEEKKIPPHIDWDDIDSIATEftRQKFKKINPETLGQASRISGVNPADISILMV 600 

QVAKMKRMEEKRIP +IDWD IDSIATEARQKFKKINPET+GQASRISGVNPADISILM+ 
Sbjct: 561 QVAKMKRMEEKRIPTNIDWDAIDSIATEARQKFKKINPETIGQASRISGVNPADISILMI 620 



Query: 601 YLEGRQKGRK 610 
60 YLEG K + 

Sbjct: 621 YLEGNGKAHR 630 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1036 

A DNA sequence (GBSxll08) was identified in S.agalactiae <SEQ ID 3201> which encodes the amino 
acid sequence <SEQ ID 3202>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
5 >» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

10 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07750 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 

15 Identities = 205/644 (31%) , Positives = 362/644 (55%) , Gaps = 28/644 (4%) 

Query: 35 LLIAI FVALS FWALLYYQ KITYELSEVEQIELBNDQTE 73 

++ + VAL F++AL +YQ +I++E +1 L+ + 

Sbjct: 14 VIALLAVALVFLIALSFYQWQLGVIGVLLLLVIAI FSLRARI SFERDLEQYI STLSYRVH 73 

20 

Query: 74 VSLKSLLEQMPVGVIQFDLETNDIEWFNPYA-ELIFTGDNGHFQSATVKDIITSRRNGTA 132 

+ + + Q+PVG+I ++ + ++W NPYA E + + +++ + GT 

Sbjct: 74 KAGEFAVTQLPVGMILYNDQLR-VQWVNPYAAEHLPKAEIDASLEELSPELVRALEEGTD 132 

25 Query: 133 GQSFEYGDNK^SAYLDTETGVFYFFDNFMGMSRNYDSSMLRPVIGIISIDNYDDIMDTML 192 

Q + Y + YFFD R + +PV+ I +DNYD++ M 

Sbjct: 133 EQKIVIEEKTYDCTFKPNERIilYFFDlTESERMHQQFEESQPVLTFIYLDNYDEVTQGME 192 



30 



40 



Query: 193 FJffiMSKINAFOTSFISDFTQSKNIFYRRVNMDRYYIFTDYSVUn'LIKDKFDILNEFRKR 252 

+ S++ + VTS ++ + ++F RR DR+ Y L + K KF IL+E R+ 

Sbjct: 193 DQTOSRLMSQVTSSLNQWANEHDLFLRRTAADRFIAVMSYGSLLAIEKrKFGILDEIRET 252 



Query: 253 AQENHLSLTLSMGISYGDGNHNQIGQIALEKLNTALVRGGDQIVVRENDSSKKALYFGGG 312 
+ + LTLS+G+ YGD + ++GQ+A +L+ AL RGGDQ+ +++ K ++GG 

35 Sbjct: 253 TGKEKIPLTLSIGVGYGDLSLRELGQLAQSSLDLALGRGGDQVAIKQKTG--KVRFYGGK 310 



Query: 313 AVSTIKRSRTRTRAMMTAISDRLKVVDSVFIVGHRKLDMDALGASVGMQFFASNIVNASY 372 

+ + KR+R RR + A+D+ DV ++GH+ DMDA+GA++G+ A ++ 
Sbjct: 311 SNAMEKRTRVRARVISHALRDFVLESDRVIVMGHKNPDMDAVGAAIGILKIAEVNDREAF 370 

Query: 373 WYDPNDMNSDIERAIDYLQEDGET- -RLVSVERAFELITQNSLLVMVDHSKTALTLSKE 430 

W DPND+N D+ + ++ ++++ + + ++ E + EL+T+ +LLV+VD K ++ + 
Sbjct: 371 WLDPNDVNPDVSKLMEEVEKNEQLWDKFITPEESLELMTEETLLVIVDTHKPSMVIEPR 430 

45 Query: 431 FFNKFADVIWDHHRRDEDFPKNAVLSFIESGASSASELVTELIQFQQAKDKLSRSQASI 490 

+ V+V+DHHRR E+F ++ VL ++E ASS +ELVTEL+++Q K K+ +++ 
Sbjct: 431 LLDYVERVVvLDHHRRGEEFIEDPvIiVYMEPYASSTAELVTELLEYQPKKLKMDILESTA 490 

Query: 491 imGIMLDTRNFASNVTSRTFDVASYLRGLGSNSMAIQKISATDFDEYRLINELILKGER 550 
50 L+AG+++DT++FA +RTFD AS+LR G++++ +QK+ D + Y +L+ + 

Sbjct: 491 LLAGMIVDTKSFAIRTGARTFDAASFLRSHGADTVIiVQKLLKEDLNHYvKRAKLVETAKL 550 

Query: 551 IYDNIIVATGEEHKvYSHVIASKAADTMLTMAGIEATFVITKNSSN-IGISARSRNNINV 609 
D + +AT E + S ++ ++AADT+LTM G+ A+FVI++ + ISARS ++NV 

55 Sbjct: 551 YRDGMAIATAREEEAVSQLLIAQAADTLLTMKGWASFVISRRHDGWSISARSLGDVNV 610 

Query: 610 QRIMEKLGGGGHFSFAACQIQDKSVKQVRRMLLEIIDEDLRENS 653 

Q IME L GGGH + AA Q +D ++++ L E 1D+ L S 
Sbjct: 611 QLIMESLDGGGHLTNAATQFEDATLEEAEAKLKEAIDQYLEGGS 654 

60 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3203> which encodes the amino acid 
sequence <SEQ ID 3204>. Analysis of this protein sequence reveals the following: 



Possible site: 25 
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>>> Seems to have an uncleavable N-term signal seg 

INTEGRAL Likelihood =-18.57 Transmembrane 33 - 49 ( 6 - 56) 
INTEGRAL Likelihood =-10.14 Transmembrane 12 - 28 ( 6 - 32) 



Final Results 

bacterial membrane Certainty=0. 8429 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB07750 GB:AP001520 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 199/659 (30%) , Positives = 367/659 (55%) , Gaps = 16/659 (2%) 

Query: 1 MKKF---RFETIHLI-MMGLILFGLLALCVSIMQSKILILLAIFLVLLFW-ALLWYQKE 55 

M KF R+ H+I ++ + L L+AL Q ++ +L + ++ +F + A + ++++ 

Sbjct: 1 MPKFLLKRWHGYHVIALLAVALVFLIALSFYQWQLGVIGVLLLLVIAIFSLRARISFERD 60 

Query: 56 AYQLSDIAHIELLNEQTEDNLKTLLDNMPVGWQFDQETNAVEWYNPYA-ELIFTTEEGF 114 

Q +1 L+ + + + +PVG++ ++ + V+W NPYA E + E 

Sbjct: 61 LEQ YISTLSYRVHKAGEEAVTQLPVGMILYNDQLR-VQWVNPYAAEHLPKAEIDA 114 

Query: 115 IQNGLIQQIITEKRREDISQTFEVSGNKYTSYIDVSSGIFYFFDSFVGNRQLADASMLRP 174 

L +++ Q + Y + + YFFD R +P 

Sbjct: 115 SLEELSPELVRALEEGTDEQKIVIEEKTYDCTFKPNERLIYFFDITESERMHQQFEESQP 174 

Query: 175 WGIISVDNYDDITDDLSDADTSKINSFVANFIDEFMESKRIFYRRVNMDRYYFFTDFKT 234 

V+ I +DNYD++T + D S++ S V + ++++ +F RR DR+ + + 

Sbjct: 175 VLTFIYLDNYDEVTQGMFJDQTOSRIjMSQWSStNQWANEHDLFLRRTAADRFIAvMSYGS 234 

Query: 235 IOTLMDNKFSVLEEFRKEAQDAQRPLTLSIGISTOEFjNHSQIGQVALENMIALTOGGDQ 294 

L + KF +L+E R+ + PLTLSIG+ +G+ + ++GQ+A +L++AL RGGDQ 

Sbjct: 235 LIAIEKTKFGII£>EIRETTGKEKIPLTLS1GVGYGDLSLRELGQIAQSSLDIALGRGGDQ 294 

Query: 295 IVIRENADHTNPIYFGGGSVSOTKRSRTRTRAMMTAISDRIKMVDNVFIVGHRKLDMDAL 354 

+ I++ ++GG S + KR+R RR + A+D+ DV ++GH+ DMDA+ 

Sbjct: 295 VAIKQKTGKVR- - FYGGKSNAMEKRTRVRARVISHALRDFVLESDRVI VMGHKNPDMDAV 352 

Query: 355 GSAVGMQFFAGNIIENSFAVYNPDEMSPDIERAIERLQADGKT--RLISVSQAMGLVTPR 412 

G+A+G+ A +F V +P++++PD+ + +E ++ + + +1+ +++ L+T 

Sbjct: 353 GAAIGILKIAEVNDREAFWLDPNDTOPDVSKLMEEVEKNEQLWDKFITPEESLELMTEE 412 

Query: 413 SLLVMVDHSKISLTLSKEFYEQFQWIWDHHRRDDDFPDNAILTFIESGASSAAELVTE 472 

+LLV+VD K S+ + + + V+V+DHHRR ++F ++ +L ++E ASS AELVTE 

Sbjct: 413 TLLVIVDTHKPSMVIEPRLLDYVERVWLDHHRRGEEFIEDPVLVYMEPYASSTAELVTE 472 

Query: 473 LIQFQNAKKCLNKIQASVLMAGIMLDTKNFSTRVTSRTFDVASYLRSKGSDSVEIQNISA 532 

L+++Q K ++ ++++ L+AG+++DTK+F+ R +RTFD AS+LRS G+D+V +Q + 
Sbjct: 473 LLEYQPKKLKMDILESTALLAGMIVDTKSFAIRTGARTFDAASFLRSHGADTVLVQKLLK 532 

Query: 533 TDFEEYKQINEI1LQGERLGDSIIVAAGEKNHLYSNVIASKAADTILSMAHVEASFVLVE 592 

D Y + +++ + D + +A + S ++ ++AADT+L+M V ASFV+ 
Sbjct: 533 EDLNHYVKRAKLVETAKLYRDGMAIATAREEEAVSQLLIAQAADTLLTMKGVVASFVISR 592 

Query: 593 TASHKIAISARSRSKINVQRVMEKLG<3GGHE , NLAACQLTDISLPQAKYLLLKTINMTMK 651 

++ISARS +NVQ +ME L GGGH AA Q D +L +A+ L + 1+ ++ 
Sbjct: 593 RHDGWSISARSLGDVNVQLIMESLDGGGHLTNAATQFEDATLEEAEAKLKEAIDQYLE 651 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 428/658 (65%) , Positives = 547/658 (83%) , Gaps = 1/658 (0%) 

Query: 1 MKRFRFATVHLVLIGLILFGLIAICTRLFQSYTALLIAIFV7AL 60 

MK+FRF T+HL+++GLILFGLLA+CV + QS +LLAIF+ L FWALL+YQK Y+LS 
Sbjct: 1 MKKFRFETIHLIMMGLILFGLLALCVSIMQSKILIIiIiAIFLVLLFVVALLWYQKEAYQLS 60 



Query: 61 EVEQIELLNDQTEVSLKSLLEQMPVGVIQFDLETNDIEWFNPYAELIFTGDNGHFQSATV 120 



WO 02/34771 



PCT/GB01/04789 



-1157- 



Sbjct: 



61 



++ IELLN+QTE +LK+LL+ MPVGV+QFD ETN +EW+NPYAELI FT + G Q+ + 
DLAHIELIMQTEDNIiKTIiLDNMPVGWQFDQETNAVEWyNPYAELIFTTEEGFIQNGLI 



120 



Query: 



121 



KDI ITSRRNGTAGQSFEYGDNKYSAYLDTETGVFYFFDNFMGNRRNYDSSMLRPVIGI I S 
+ I IT +R Q+FE NKY++Y+D +G+FYFFD+F+GNR+ D+SMLRPV+GIIS 



180 



Sbjct: 121 QQIITEKRREDISQTFEVSGNKYTSYIDVSSGIFYFFDSFVGNRQIADASMLRPWGIIS 180 

Query: 181 IDNYDDIMDTMLEADMSKINAFVTSFISDFTQSKMIFYRRVimDRYYIFTDYSVLNTLIK 240 

+DNYDDI D + +AD SKIN+FV +FI +F +SK IFYRRVNMDRYY FTD+ LN L+ 
Sbjct: 181 VDOTDDITDDLSDADTSKINSFVANFIDEFMESKRIFYRRW^RYYFFTDFKTLNDLMD 240 

Query: 241 DKFDIMJEFRKRAQENHLSLTLSMGISYGTONHNQIGQIALENIiNTALVRGGDQIVVREN 300 

+KF +L EFRK AQ+ LTLS+GIS+G+ NH+QIGQ+ALENLN ALVRGGDQIV+REN 
Sbjct: 241 NKFSVLEEFRKEAQDAQRPLTLSIGISFGEENHSQIGQVALENmiALVRGGDQIVIREN 300 

Query: 301 DSSKKALYFGGGAVSTIKRSRTRTRAMMTAISDRLKVVDSVFIVGHRKLDMDALGASVGM 360 

+YFGGG+VST+KRSRTRTRAMMTAISDR+K+VD+VFIVGHRKLDMDALG++VGM 
Sbjct: 301 ADHTNPIYFGGGSVSTVKRSRTRTRAMMTAISDRIKMVDNVFIVGHRKLDMDALGSAVGM 360 

Query: 361 QFFASNIVNASYWYDPIvIDMNSDIERAIDYLQEDGETRLVSVERAFELITQNSLLVMVDH 420 

QFFA NI+ S+ VY+P++M+ DIERAI+ LQ DG+TRL+SV +A L+T SLLVMVDH 
Sbjct: 361 QFFAGNIIENSFAVYNPDEMSPDIERAIERLQADGKTRLISVSQAMGLVTPRSLLVMVDH 420 

Query: 421 SKTALTLSKEFFNKFADVIWDHHRRDEDFPKNAVLSFIESGASSASELVTELIQFQQAK 480 

SK +LTLSKEF+ +F +VIWDHHRRD+DFP NA+L+FIESGASSA+ELVTELIQFQ AK 
Sbjct: 421 SKISLTLSKEFYEQFQNVIWDHHRRDDDFPDMAILTFIESGASSAAELVTELIQFQNAK 480 

Query: 481 DK1SRSQASILMAGIMLDTRNFASNVTSRTFDVASYLRGLGSNSMAIQKISATDFDEYRL 540 

L++ QAS+LMAGIMLDT+NF++ VTSRTFDVASYLR GS+S+ IQ ISATDF+EY+ 
Sbjct: 481 KCLNKIQASVLMAGIMLDTKNFSTRVTSRTFDVASYLRSKGSDSVEIQNISATDFEEYKQ 540 

Query: 541 INELILKGERI YDNI IVATGEEHKVYSHVIASKAADTMLTMftGIEATFVITKNSSN- IGI 599 

INE+IL+GER+ D+IIVA GE++ +YS+VIASKAADT+L+MA +EA+FV+ + +S+ I I 
Sbjct: 541 INEI ILQGERLGDS I IVAAGEKNHLYSNVIASKAADTILSMAHVEASFVLVETASHKIAI 600 

Query: 600 SARSRNNINVQRIMEKLGGGGHFSFAACQIQDKSVKQVRRMLLEIIDEDLRENSTVEN 657 

SARSR+ INVQR+MEKLGGGGHF+ AACQ+ D S+ Q + +LL+ 1+ ++E VE+ 
Sbjct: 601 SARSRSKINVQRWEKLGGGGHFNLAACQLTDISLPQAKYLLLKTINMTMKETGEVES 658 

A related GBS gene <SEQ ID 8717> and protein <SEQ ID 871 8> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 

McG : Discrim Score : 13.82 

GvH: Signal Score (-7.5): -0.890001 

Possible site: 44 
>>> Seems to have a cleavable N-terra signal seq. 
ALOM program count: 0 value: 2.97 threshold: 0.0 
PERIPHERAL Likelihood = 2.97 574 
modified ALOM score: -1.09 

*** Reasoning Step: 3 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

31.3/55.8% over 631aa 
Bacillus subtilis 

EGAD|l9304| hypothetical 74.3 kd protein in rpli-cotf intergenic region Insert 
characterized 

SP|P37484|YYBT_BACSU HYPOTHETICAL 74.3 KDA PROTEIN IN RPLI-COTF INTERGENIC REGION . Insert 
character! zed 

GP|467336 |dbj |BAA05182 .l| |D26185 unknown Insert characterized 
GP|2636598|emb|CAB16088.l| |Z99124 yybT Insert characterized 
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PIR|S65976|S65976 yybT protein - Insert characterized 
ORF00251 (364 - 2241 of 2580) 

EGAD| 19304 |BS4045 (20 - 651 of 659) hypothetical 74.3 kd protein in rpli-cotf intergenic 
5 region {Bacillus subtilis}SP | P37484 | YYBT_BAC 

SU HYPOTHETICAL 74.3 KDA PROTEIN IN RPLI-COTF INTERGENIC 

REGION. GP | 467336 | dbj |BAA05182 . 1 | |D26185 unknown {Bacillus subtilis}GP | 26365 

98|eitib|CAB16088.l| | Z99124 yybT {Bacillus subtilis}PIR| S65976 | S65976 yybT protein - Bacillus 
subtilis 
10 %Match =18.5 

%Identity = 31.2 %Similarity = 55.8 

Matches = 197 Mismatches = 271 Conservative Sub.s = 155 

258 288 318 348 378 408 438 468 

15 N***CSPLFIRGVLCYN* vLRGYLMKRFRFATVHLVLIGLILFGLLAICWLFQSYTALLLAIFVALSFWALLYYQKIT 

I I : : |:| | | :| | : | |:: :: 

MPSFYEKPLFRYPIYALIALSIITILISFYFNWILGTVEVLLLAVILFFIKRAD 
10 20 30 40 50 

20 522 552 582 612 666 696 

YEL-SEVEQ-IELLNDQTEVSLKSLLEQMPVGVIQFDLETNDIEWFNPYAELIFTGDN- -GHFQSATVKDIITSRRNGTA 

: |:= I |= : : = | :||:|:: |: : ||| ||: | | | : :: : 

SLIRQEIDAYISTLSYRLKKVGEEALMEMPIGIMLFN-DQYYIEWANPFLSSCFNESTLVGRSLYDTCESWPLIKQEVE 
70 80 90 100 110 120 130 

25 

726 756 786 816 846 876 906 936 

GQSFEYGDNKYSAYLDTETGVFYFFDNFMGNRRNYDSSMLRPVIGIISIDNYDDIMDTMLEADMSKINAXVTSFXSDFTQ 

||::: = I 1= I Hlllh : : I =1= lib = = I 

SETVTIiNDRKFRWIKRDERLLYFFDVTEQIQIEKLYENERTVIAYIFLDNYDDvTQGLDDQTRSTMNSQOT 
30 150 160 170 180 190 200 210 

966 996 1026 1056 1086 1116 1146 1176 

SKNIFYRRVNMDRYYIFTDYSVIiNTLIKDKFDIIilffiFRK^ 

II :| = =h = =1 I II Ihl h: = ::||||:|: = = = 1 =1 =1= II III 

3 5 EYGI FLKRTSSERFIAVIiNEHILTELENSKFSI]jDETOEKTSFIX3VALTLS 

230 240 250 260 270 280 290 



1206 1236 1266 1296 1326 1356 1386 1416 

DQIvWEmSSSKKALYFGGGAVSTIKRSRTRTRAMMTAISDRLKOTDSVFIVGHRKLDMDALGASVGMQFFASNIVNASY 
40 ||: :: : | ::|| ||:| | | : |: : : :| |:||: |||::||::|: | 

DQVAIKLPNGKVK- - FYGGKTNPMEKRTRVRARVI SHALKEIVTESSNVI IMGHKFPDMDS IGAAIGILKVAQANNKDGF 
310 320 330 340 350 360 370 

1446 1476 1500 1530 1560 1590 1620 1650 

45 WYDPNDMNSDIERAIDYLQEDGE- -TRLVSVERAFELITQNSLLVMVDHSKTALTLSKEFFNKFADVIVVDHHRRDEDF 

=1 III : I ::| I === I =1=== I I h -llhll | :| : : : || ::|:||||| |:| 
IVIDPNQIGSSVQRLIGEIKKYEELWSRFITPEEAMEISNDDTLLVIVDTHKPSLVMEERLVNKIEHIWIDHHRRGEEF 
390 400 410 420 430 440 450 

50 1680 1710 1740 1770 1800 1830 1860 1890 

PKNAVLSFIESGASSASELVTELIQFQQAKDKLSRSQASILMAGIMLDTRNFASNVTSRTFDVASYLRGLGSNSMAIQKI 

= : =1 = = l III :||||||:::| = l = = =1= I = I I I = = I I = = I = lllll Mill |:: = : :|| 
IRDPLLVYMEPYASSTAELVTELLEYQPKRLKINMIEATALIAGIIVDTKSFSLRTGSRTFDAASYLRAKGADTVLVQKF 
470 480 490 500 510 520 530 

55 

1920 1950 2004 2034 2064 2091 2121 

SATDFDEYRLINELILKGERIYDNIIVAT- -GEEHKVYSHVIASKAADTMLTMAGIEATFVITK-NSSNIGISARSRNNI 
I I :|| III :|: I = = =1= ::|||::|:|: =11=1 = = = : ill : 

LKETVDSYIKRAKLIQHTVLYKDNIAIASLPENEEEYFDQVLIAQAADSLLSMSEVEASFAVARRDEQTVCISARSLGEV 
60 550 560 570 580 590 600 610 

2151 2181 2211 2241 2271 2301 2331 2361 

NVQRIMEKLGGGGHFSFAACQIQDKSWQVRRMLLEIIDEDLRENSTVENRRD*LR*KLFFYKMLRGICEKKVRLRKYLLV 
III III I MM:: || |: || : | ||| s 
65 NVQIIMEALEGGGHLTNAATQLSGISVSEALERLKHAIDEYFEGGVQR 

630 640 650 
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SEQ ID 8718 (GBS10) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 (lane 6; MW 98kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 2 (lane 7; MW 73kDa). 

The GST-fusion protein was purified as shown in Figure 189, lane 3. 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1037 

A DNA sequence (GBSxll09) was identified in S.agalactiae <SEQ ID 3205> which encodes the amino 
acid sequence <SEQ ID 3206>. Analysis of this protein sequence reveals the following: 

10 Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4643 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA43972 GB:X62002 ribosomal protein L9 [Bacillus 
20 stearothermophilus] 

Identities = 80/149 (53%) , Positives = 105/149 (69%) , Gaps = 2/149 (1%) 

Query: 1 MKVIFLQDVKGKGKKGEVKEVPTOYAQNFLLKKIIl^KjEATTQAIGELKGKQKSEEKAQAE 60 
MKVIFL+DVKGKGKKGE+K V GYA NFL K+ LA EAT + L+ +++ E++ AE 
25 Sbjct: 1 MKVI FLKDVKGKGKKGEI KNVADGYANNFLFKQGLA1 EATPANLKALEAQKQKEQRQAAE 60 

Query: 61 IIAQAKELKTQLESETTRVQFIEKVGPDGRTFGSITAKKIAEELQKQYGIKIDKRHIDLD 120 

LA AK+LK QLE T ■ + KG GR FGSIT+K+IAE LQ Q+G+K+DKR I+L 
Sbjct: 61 ELANAKKLKEQLEKLTVTIP- -AKAGEGGRLFGS1TSKQIAESLQAQHGLKLDKRKIELA 118 



30 



Query: 121 HTIRAIGKVEVPVKLHKQVSSQIKLDIKE 149 

IRA+G VPVKLH +V++ +K+ + E 
Sbjct: 119 DAIRALGYTNVPVKLHPEVTATLKVHVTE 147 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 3207> which encodes the amino acid 
sequence <SEQ ID 3208>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 . 4630 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 119/150 (79%) , Positives = 138/150 (91%) 

Query: 1 MKVIFLQDVKGKGKKGEVKEVPTGYAQNFLLKKNLAKEATTQAIGELKGKQKSEEKAQAE 60 
MKVI FL DVKGKGKKGE+KEVPTGYAQNFL+KKNLAKEAT+Q+ IGELKGKQK+EEKAQAE 
50 Sbjct: 1 MKVI FLADVKGKGKKGEI KEVPTGYAQNFLI KKNLAKEATSQS IGELKGKQKAEEKAQAE 60 

Query: 61 ILAQAKELKTQLESETTRVQFIEKVGPDGRTFGSITAKKIAEELQKQYGIKIDKRHIDLD 120 

ILA+A+ +K L+ + TRVQF EKVGPDGRTFGSITAKKI+EELQKQ+G+K+DKRHI LD 
Sbjct: 61 ILAFAQAVKAVLDEDKTRVQFQEKVGPDGRTFGSITAKKISEELQKQFGVKVDKRHIVLD 120 

55 
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Query: 121 HTIRAIGKVEVPVKLHKQVSSQIKLDIKEA 150 

H IRAIG +EVPVKLHK+V+++IKL I EA 
Sbjct: 121 HPIRAIGLIEVPVKLHKEVTAEIKLAITEA 150 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1038 

A DNA sequence (GBSxlllO) was identified in S.agalactiae <SEQ ID 3209> which encodes the amino 
acid sequence <SEQ ID 3210>. This protein is predicted to be DNA polymerase III delta prime subunit 
10 (dnaB). Analysis of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.43 Transmembrane 204 - 220 ( 204 - 220) 

15 Final Results 

bacterial membrane Certainty=0 . 1171 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 A related DNA sequence was identified in S.pyogenes <SEQ ID 2423> which encodes the amino acid 
sequence <SEQ ID 2424>. Analysis of this protein sequence reveals the following: 
Possible site: 21 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.27 Transmembrane 210 - 226 ( 210 - 226) 

25 



30 



Final Results 

bacterial membrane Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm ■ Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 397/450 (88%), Positives = 431/450 (95%), Gaps = 1/450 (0%) 

Query: 3 EVSELRVQPQDLLAEQA VLGS I F I S PEKL IMVREFI S PDDFYKYSHKVT FRAMITLADRN 62 
35 EV+ELRVQPQDLLAEQ+VLGSIFISP+KLI VREFISPDDFYKY+HK+IFRAMITL+DRN 

Sbjct: 8 EVAELRVQPQDLLAEQS VLGSI FI SPDKLIAVREFI SPDDFYKYAHKI I FRAMITLSDRN 67 

Query: 63 DAIDAATVRNILDDQGDLQNIGGLGYIVELVNSVPTSANAEFYAKIVSEKAMLRDIISKL 122 
DAIDA T+R ILDDQ DLQ+IGGL YIVELVNSVPTSANAE+YAKIV+EKAMLRDI I ++L 
40 Sbjct: 68 DAIDATTIRTILDDQDDLQSIGGLSYIVELVNSVPTSANAEYYAKIVAEKAMLRDIIARL 127 

Query: 123 TDTVNMAY- EGNDSDE 1 1 ATAEKALVD INEHSNRSGFRKI SD VLKVNYENLELRSQQTSD 181 

T++VN+AY E +E+IA E+AL+++NEHSNRSGFRKISDVLKVNYE LE RS+QTS+ 
Sbjct: 128 TESvNLAYDEILKPEEVIAGvERALIELNEHSNRSGFRKISDvLKVNYEALEARSKQTSN 187 

45 

Query: 182 VTGLPTGFRDLDRITTGLHPDQLIILAARPAVGKTAFvLNIAQNVGTKQNRPVAIFSLEM 241 

VTGLPTGFRDLD+ITTGLHPDQL+ILAARPAVGKTAFVLNIAQNVGTKQ + VAIFSLEM 
Sbjct: 188 VTGLPTGFRDLDKITTGLHPDQLVILAARPAVGKTAFVLNIAQNVGTKQKKTVAIFSLEM 247 

50 Query: 242 GAESLv^JRMLAAEG^WDSHSLRTGQLTDQDWNNVTIAQGALADAPIYIDDTPGIKITEIR 301 

GAESLVDRMLAAEGMVDSHSLRTGQLTDQDWNNVTIAQGALA+APIYIDDTPGIKITEIR 
Sbjct: 248 GAESLvDRMLAAEGrWDSHSLRTGQLTDQDWNNVTIAQGALAEAPIYIDDTPGIKITEIR 307 

Query: 302 ARSRKLSQEVDDGLGLIVIDYLQLISGTRPENRQQEVSEISRQLKILAKELKVPVIALSQ 361 
55 ARSRKLSQEVD GLGLIVIDYLQLI+GT+PENRQQEVS+ISRQLKILAKELKVPVIALSQ 

Sbjct: 308 ARSRKLSQEVDGGLGLIVIDYLQLITGTKPENRQQEVSDISRQLKILAKELKVPVIALSQ 367 

Query: 362 LSRGVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYRREGEEAEEIVEDNTVEVIL 421 
LSRGVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYR+E ++AEE VEDNT+EVIL 
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Sbjct: 368 LSRGVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYRKECDDAEEA.VEDNTIEVIL 427 

Query: 422 EKNRAGARGTVKLMFQKEYNKFSSIAQFEE 451 
EKNRAGARGTVKLMFQKEYNKFSSIAQFEE 
5 Sbjct: 428 EKNRAGARGTVKLMFQKEYNKFSSIAQFEE 457 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1039 

10 A DNA sequence (GBSxllll) was identified in S.agalactiae <SEQ ID 321 1> which encodes the amino 
acid sequence <SEQ ID 3212>. Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0 .4909 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3213> which encodes the amino acid 
sequence <SEQ ID 3214>. Analysis of this protein sequence reveals the following: 



25 



30 



50 



Possible site: 21 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3467 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 77/90 (85%) , Positives = 84/90 (92%) 

Query: 1 MSDAFADVAKMKKIKEDIKSHEGQMVELTLENGRKREKNKIGRLIEVYPSLFIVEYKDTA 60 
35 MSDAF DVAKMKKIKEDI++HEGQ+VELTLENGRKREKNKIGRLIEVY SLFI+EY D++ 

Sbjct: 11 MSDAFTDVAKMKKIKEDIRAHEGQLVELTLENGRKREKNKIGRLIEVYSSLFIIEYSDSS 70 

Query: 61 AVPGAIDNTYVESYTYSDILTEKTLIRYFD 90 
PGAIDN+YVESYTYSDILTEKTLIRY D 
40 Sbjct: 71 DTPGAIDNSYVESYTYSDILTEKTLIRYLD 100 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1040 

45 A DNA sequence (GBSxlll2) was identified in S.agalactiae <SEQ ID 3215> which encodes the amino 
acid sequence <SEQ ID 321 6>. This protein is predicted to be 30S ribosomal protein S4 (rpsD). Analysis of 
this protein sequence reveals the following: 



Possible site: 27 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2937 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) <. suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC00397 GB:AF008220 ribosomal protein S4 [Bacillus subtilis] 
Identities = 138/201 (68%), Positives = 158/201 (77%), Gaps = 1/201 (0%) 

Query: 1 MSRYTGPSWKQSRRLGLSLTGTGKELARRNYVPGQHGPNNRSKLSEYGLQLAEKQKLRFS 60 

M+RYTGPSWK SRRLG+SL+GTGKEL +R Y PG HGP R KLSEYGLQL EKQKLR 
Sbjct: 1 MARYTGPSWKLSRRLGISLSGTGKELEKRPYAPGPHGPGQRKKLSEYGLQLQEKQKLRHM 60 

Query: 61 YGLGEKQFRNLWQATKAKEGTLGFNFMVIiLERRIJDNVWRLGIATTRRQARQFVNHGHI 120 

YG+ E+QFR LF +A K G G NFM+LL+ RLDNWY+LGLA TRRQARQ VNHGHI 
Sbjct: 61 YGVT^RQFRTLFDKAGKIA-GKHGENFMILI£)SRLDNVWKLGIARTRRQARQLVNHGHI 119 

15 Query: 121 LVDGKRVD I PSYRVTPGQVI S VREKSMKVPAI LEAVEATLGRPAFVS FDAEKLEGSLTRL 180 

LVDG RVDIPSY V PGQ I VREKS + I E+VE P 4- + + FDAEKLEG+ TRL 

Sbjct: 120 LVT3GSRVIDIPSYLWPGQTIGTOEKSRNLSIIKESVEVNNFVPEYLTFDAEKLEGTFTRL 179 

Query: 181 PERDE INPE INEAL WEFYNK 201 
20 PER E+ PEINEAL+VEFY++ 

Sbjct: 180 PERSELAPE INEAL IVEFYSR 200 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3217> which encodes the amino acid 

sequence <SEQ ID 3218>. Analysis of this protein sequence reveals the following: 

25 Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2937 (Affirmative) < suco 

30 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/203 (99%), Positives = 201/203 (99%) 

35 

Query: 1 MSRYTGPSWKQSRRLGLSLTGTGKELARRNYVPGQHGPNNRSKLSEYGLQLAEKQKIiRFS 60 

MSRYTGPSWKQSRRLGLSLTGTGKELARRNYVPGQHGPNNRSKLSEYGLQLAEKQKLRFS 
Sbjct: 1 MSRYTGPSWKQSRRLGLSLTGTGKEIARRNWPGQHGPNNRSKLSEYGLQLAEKQKLRFS 60 

40 Query: 61 YGLGEKQFRNLWQATKAKEGTLGFNFMVLLERRLDNVVYRLGLATTRRQARQFVNHGHI 120 

YGLGEKQFRNLFVQATK KEGTLGFNFMVLLERRLDNVVYRLGLATTRRQARQFVNHGHI 
Sbjct: 61 YGLGEKQFRNLFVQATKIKEGTLGFNFMVLLERRLDNWYRLGLATTRRQARQFVNHGHI 120 

Query: 121 LVDGKRVDI PSYRVTPGQVI SVREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 180 
45 LVDGKRVDI PSYRV PGQVISVREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 

Sbjct: 121 LVDGKRVDI PSYRVDPGQVISVREKSMKVPAILEAVEATLGRPAFVSFDAEKLEGSLTRL 180 

Query: 181 PERDEINPEINEALWEFYNKML 203 
PERDEINPEINEALVVEFYNKML 
50 Sbjct: 181 PERDEINPEINEALWEFYNKML 203 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1041 

55 A DNA sequence (GBSxlll3) was identified in S.agalactiae <SEQ ID 3219> which encodes the amino 
acid sequence <SEQ ID 3220>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .4067 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF98302 GB:AF243383 unknown; Or£3 [Lactococcus lactis subsp. 
lactis] 

Identities = 46/97 (47%) , Positives = 69/97 (70%) 

10 

Query: 1 MNLNDRLKIEEMEEKYDSFKPRINALvmiDDFQKHYEDWKLREFYGSEDWFRLSEQTE 60 

M+ D I++ME KYD+F P + L+++++ F Y +Y++LR FYGSE WF E + 
Sbjct: 1 fTONKDIELIQQ^NKYDTFMPvXTNLIDSOTKFNSIYM^IELRNFYGSEKWFEYMEIEK 60 

15 Query: 61 NNLKCGVLSEDQLFDFIGEHNELVGQFLDMSSQMYRH 97 

+KCGVL+EDQLFD I +HNEL+G LD++S+MY++ 
Sbjct: 61 IPVKCGVLTEDQLFDMISDHNELLGVLLDLTSKMYKN 97 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3221> which encodes the amino acid 
20 sequence <SEQ ID 3222>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 3465 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 48/98 (48%) , Positives = 74/98 (74%) 

Query: 1 MNLNDRLKIEEMEEKYDSFKPRINALVEAIDDFQKHYEDYVKLREFYGSEDWFRLSEQTE 60 

M D+L +E+ME+ Y++F P++ L+EA+D F++HYE+Y LR FY S++WFRL+ Q 
Sbjct: 1 MTKQDQLIWKMEQTYEAFSPKLANLIEALDAFKEHYEEYATLRNFYSSDEWFRLANQPW 60 

35 

Query: 61 NNLKCGVLSEDQLFDFIGEHNELVGQFLDMSSQMYRHL 98 

+++ CGVLSED LFD IG+HN+L+ LD++ MY+H+ 
Sbjct: 61 DDIPCGVLSEDLLFDMIGDHNQLLADILDLAPIMYKHM 98 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1042 

A DNA sequence (GBSxlll4) was identified in S.agalactiae <SEQ ID 3223> which encodes the amino 
acid sequence <SEQ ID 3224>. Analysis of this protein sequence reveals the following: 

45 Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0965 (Affirmative) < suco 

50 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04438 GB:AP001509 transcriptional regulator (TetR/AcrR 
55 family) [Bacillus halodurans] 

Identities = 47/181 (25%), Positives = 95/181 (51%), Gaps = 16/181 (8%) 

Query: 4 DTRREKTKRAIFJ^ITLLKDQSFDEISTINLTKTAGISRSSFYTHYKDKYEMIDQYQQS 63 
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D R++ T+ ++ +++ L++++ 1+ + A I+RS+FY+HY D Y+++ Q + 
Sbjct: 6 DRRKKYTRMLLKESLMKLMQEKPLSNITIKEICDLftDINRSTFYSHYTDLYDLLYQIEDE 65 

Query: 64 LFNKV- EYI FDRNQFKKEDAL LEIFQFLDRESLFAALLTQNGTKEIQTYILNKLQ 117 

5 + + E + M K E+AL L ++ +RES L ++ G Q K 

Sbjct: 66 1 1 KDLSEALSS YNYTKDEEALQMTENLLVYIAWWRESC - QTLFSE YGDPS FQ KKV 119 

Query: 118 LMLSKELPWNP---DATKSDINRLYYSVYLSHAIFGVYQMWITRGKKESPQQITQVLLSL 175 
+ML+ + + P TK DI+ Y S+Y+ + + Q W+ G K+SP+++ ++++ L 

10 Sbjct: 120 ^IMDHVIKTPLVGKHTKPDISE-YVSLYIVNGSIHIVQSWLKNGLKQSPKEMAELIIKIi 179 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3225> which encodes the amino acid 

sequence <SEQ ID 3226>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
15 >>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB04438 GB:AP001509 transcriptional regulator (TetR/AcrR 
family) [Bacillus halodurans] 
25 Identities = 47/180 (26%) , Positives = 88/180 (48%) , Gaps = 18/180 (10%) 



30 



Query: 4 RKENTKQAILKAMVMLLKTESFDDITTVKLSKRAGISRSSFYTHYKDKYEMIDYYQQTFF 63 

RK+ T+ + ++++ L++ + +IT ++ A 1+RS+FY+HY D Y+++ + 
Sbjct: 8 RKKYTRMLLKESLMKLMQEKPLSNITIKEICDLADINRSTFYSHYTDLYDLLYQIEDEII 67 

Query: 64 HKLEYI FEKKYQNKEQAFLEVFEFL QREQLLSSLLSANGTKEIQAFIINKVRLL- 117 

L K++ L++ E L + +L S G Q KV +L 
Sbjct: 68 KDLSEALSSYNYTKDEEALQMTENLLVYIANNRESCQTLFSEYGDPSFQ KKVMMLA 123 

35 Query: 118 ITTDLQDKFSTEELSQTEKEYQSIYLAHAFFGVCQSWIAKGKKESPQEMTQFVLKM 173 

I T h K + ++S EY S+Y+ + + QSW+ G K+SP+EM + ++K+ 
Sbjct: 124 HDHVIKTPLVGKHTKPDIS EYVSLYIVNGSIHIVQSWLKNGLKQSPKEMAELIIKL 179 

An alignment of the GAS and GBS proteins is shown below. 

40 Identities = 100/179 (55%) , Positives = 134/179 (73%) , Gaps = 2/179 (1%) 

Query: 1 MvNDTRREKTKRAIEAAMITLLKDQSFDEISTINLTKTAGISRSSFYTHYKDKYEMIDQY 60 

MVN R+E TK+AI AM+ LLK +SFD+I+T+ L+K AGISRSSFYTHYKDKYEMID Y 
Sbjct: 1 MW--RKENTKQAILKAMVMLLKTESFDDITTVKLSKRAGISRSSFYTHYKDKYEMIDYY 58 

45 

Query: 61 QQSLFNKVEYIFDRNQFKKEDALLEIFQFLDRESLFAALLTQNGTKEIQTYIIiNKLQLML 120 

QQ+ F+K+EYIF++ KE A LE+F+FL RE 1> ++LL+ NGTKEIQ +I+NK++L++ 
Sbjct: 59 QQTFFHKLEYIFEKKYQNKEQAFLEVFEFLQREQLLSSLLSANGTKEIQAFIINKVRLLI 118 

50 Query: 121 SKELPVVNPDATKSDINRLYYSVYLSHAIFGVYQMWITRGKKESPQQITQVLLSLLPQT 179 

+ +L S + Y S+YL+HA FGV Q WI +GKKESPQ++TQ +L +L T 

Sbjct: 119 TTDLQDKFSTEELSQTEKEYQSIYLAHAFFGVCQSWIAKGKKESPQEMTQFVLKMLTST 177 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1043 

A DNA sequence (GBSxlll5) was identified in S.agalactiae <SEQ ID 3227> which encodes the amino 
acid sequence <SEQ ID 3228>. Analysis of this protein sequence reveals the following: 
Possible site: 58 
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»> Seems to have no N-terminal signal sequence 
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Final Results 

10 bacterial membrane Certainty=0 . 5140 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10287> which encodes amino acid sequence <SEQ ID 
1 5 1 0288> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12856 GB:Z99109 alternate gene name: yixE-similar to phage 
infection protein [Bacillus subtilis] 
Identities = 227/783 (28%) , Positives = 387/783 (48%) , Gaps = 60/783 (7%) 

20 

Query: 45 KAIIKSPKLWITMAGVALIPTLYNVIFLSSMWDPYGNTKNLPVAVVNQDKSAKLNGKTIS 104 

K 1+ S KL I + + +P +Y+ +FL + WDPYG LPV WNQDK A G+ + 
Sbjct: 9 KDIWSKKLLIPIIAILFVPLIYSGVFLKAYTOPYGTVDQLPVVVVNQDKGATYEGEKLQ 68 

25 Query: 105 IGKDMEDNLSKNDSLDFHFTT-AKRAEKELEKGHYYMVITFPKDLSRKATTLMTEKPERL 163 

IG D+ L N++ D+HF+ ++ K+L YY+V+ P+D S+ A+T++ + P++L 
Sbjct: 69 IGDDLvTCELKDNNNFDWHFS^LDQSLKDLLNQKYYLVVEIPEDFSKNASTVLDKNPKKL 128 

Query: 164 NITYKTTKGRSFVASKMSETAANKLKDEV7AESITGTYTESVFKNMGSMKTGINKAADGSQ 223 
30 ++ Y T G ++V + + E A +KLK V++ +T YT+ +F N + G++ A+ G++ 

Sbjct: 129 DLKYHTNAGSNYVGATIGEKAIDKLKASVSKEVTEQYTKVIFDNFKDIAKGLSDASSGAK 188 

Query: 224 ELLNGSNKLQDGSQTLTSNLDVLASSSQTFSGGANKLNSGINLYTDGVGTLSNGLETLSD 283 
++ +G+ ++GS L NL L St T S +L G T G+ +L + L D 
35 Sbjct: 189 KIDDGTKDAKNGSAQLKENLAKLKESTATISDKTAQLADGAAQVTSGIQSLDSSLGKFQD 248 

Query: 284 GVTAYTTGVHKLSEGSQKLDDKSQALV EGSEKLTDGLQQLSQATQLKPEQERT 336 

+L+ GS +L K L+ +G+ LT+GL QL+ Q E+ 

Sbjct: 249 SSNQIYDKSSQLAAGSGELTSKMNELLAGLQNVQKGTPNLTNGLDQLNSKVQEGSEKAAK 308 

40 

Query: 337 LQNLSDG- -LKNLNQI ITNLQSTATTDSDTNSKLFNFLSTIESSTKALMNTAAADKQKQM 394 

+ + + L L + NL+ + T + +L +F +++++ +A N + + 
Sbjct: 309 AEKIINALDLTKLETAVNNLEKSETAMKEFKKQLTDFENSLKNRDQAFKN--VINSSDFL 366 

45 Query: 395 TAVQST SAFKSLTPEQQSQITSAVTGTPTSAE-TIAANISSNIENMKTVLSEASSS 449 

TA Q + S K L ++ PT+ + A I S++E++K +++ + 

Sbjct: 367 TAEQKSQLINSVEKKLPQVDAPDFDQILSQLPTADQLPDIATIKSSLEDVKAQVAQVKAM 426 

Query: 450 APSN NGSQNLQTLSGTANNLVLKAI SDLDKI QKLPTATKQLYQGSQTLTKGITDYT 505 

50 + NG++ +Q D I +L ++Y GSQ LT G T T 

Sbjct: 427 PEATSKLYNGAKTIQ DAIDRLTEGADKIYNGSQKLTDGQTKLT 469 

Query: 506 NAVGQLRKGAVTLDSKSNQLISGTQKASQGAQTLDSKSDQLRDGAGQLASGSDRIADGSN 565 
+G+ K + S QL++G S Q+ G +L GS ++ GS+ 

55 Sbjct: 470 AGIGEYNKQFAKAKAGSEQLVTG SSQVSGGLFKLLDGSKQVQSGSS 515 

Query: 566 KLAGGGHQLTDGLTELSGGVSQLSSSLGKAGDQLSMVSVNKDNANAVSSPVTIKHEDYDS 625 

KLA G L GL +L G +LSS LADQ + + +PVK+ S 

Sbjct: 516 KLADGSASLDTGLGKLLDGTGELSSKLKDAADQTGDIDADDQTYGMFADPVKTKDDAIHS 575 



Query: 626 VDTNGVGMAPYMISVALMVVALSANVIFAKALSGKEPANRFSWAKNK LLINGFIATL 682 

V G G+ PY++S+ L V + V+F + P N F W +K +++ G I +L 
Sbjct: 576 VPNYGTGLTPYILSMGLYVGGIMLTVVFPLKEASGRPRNGFEWFFSKFNVMMLVGIIQSL 635 



65 



Query: 683 -AATILFFAVQFIGLKPDYPGKTYFIILLTAWTLMALVTALVGWDNRYGSFLSLLILLFQ 741 
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AT+L IGL+ + + Y ++T+ +A++ h G F++++IL4 Q 

Sbjct: 636 IVATVLLLG IGLEVESTWRFYVFTIITSIAFIAIIQFIiATTMGNPGRFIAVIILVLQ 692 

Query: 742 LGSSAGTYPIELSPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQWRMLVIFLVSSMIL 801 
5 LG+S GT+P+EL P F+Q I LPMTYS++G R IS GD + W+M + + ++++ 

Sbjct: 693 LGASGGTFPLELLPNFYQVIHGALPMTYSINGFRAVIS-NGDFGYMWQMAGVLIGIALVM 751 

Query: 802 ALL 804 
L 

10 Sbjct: 752 IAL 754 

A related DNA sequence was identified in S.pyogenes <SEQ ID 201 7> which encodes the amino acid 
sequence <SEQ ID 201 8>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
15 »> Seems to have no N-terminal signal sequence 
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20 

Final Results 

bacterial membrane Certainty=0 . 4715 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 360/779 (46%), Positives. = 508/779 (64%), Gaps = 32/779 (4%) 

30 Query: 40 MLDELKAIIKSPKLWITMAGVALIPTLYNVIFLSSMWDPYGNTKNLPVAVVNQDKSAKLN 99 

ML+ELK +IK+PKL ITM GVAL+P LYN+ FL SMWDPYG +LP+AWN DK AK 
Sbjct: 1 MLEELKTLIKNPKLMITMIGVALVPALYNLSFMSIS^ 60 

Query: 100 GKTISIGKDMEDNLSKNDSLDFHFTTAKRAEKELEKGHYYMVITFPKDLSRKATTLMTEK 159 
35 K+++IG DM D +SK+ L++HF +AK+A++ L++G YYMVIT P+DLS++A TL+ + 

Sbjct: 61 DKSLTIGNDI^KMSKSKDLEYHWSAKQftQEGLKEGDYYMVITLPEDLSQRAATLLNPE 120 

Query: 160 PERLNITYKTTKGRSFVASKMSETAANKLKDEVAESITGTYTESVFKNMGSMKTGINKAA 219 
P++L I Y+T+KG VA+KM ETA KLK+ V+++IT TYT +VF +M +++G+ +A+ 
40 Sbjct: 121 PQKLTIRYQTSKGHGMVAAKMGETAMAKLKESVSQNITKTYTSAVFSSMTDLQSGLKEAS 180 

Query: 220 DGSQELLNGSNKLQDGSQTLTSNLDVLASSSQTFSGGANKLNSGINLYTDGVGTLSNGLE 279 

GSQ L +G+ Q GSQTL++NL L +SQ F G +L SG+ YTDGV + NGL 
Sbjct: 181 AGSQALASGAKTAQAGSQTLSTNLAALTGASQQFQQGTGRLTSGLTTYTDGVNQVKNGLG 240 

45 

Query: 280 TLSDGVTAYTTGVT3KLSEGSQKLDDKSQALVEGSEKLTDGLQQLSQATQLKPEQERTLQN 339 

TLS + Y GV +LS+G+ +L+ GL QL+QAT L E+ + +Q+ 

Sbjct: 241 TLSTDIPNYLNGVSRLSQGASQLNQ GLSQLTQATTLSDEKAKGIQS 286 

50 Query: 340 LSDGLKNLNQIITNLQSTATTDSDTN SKLFNFLSTIESSTKALMNTAAftDKQKQMTA 396 

L GL LNQ IL++T N +LNLI+K++ A + ++++A 
Sbjct: 287 LIVGLPVLNQGIQQLNTELSTLQPPNLNADELGNSLGAIAQAAKQVIAEETAAQNEELSA 346 

Query: 397 VQSTSAFKSLTPEQQSQITSAWGTPTSAETIAAN-ISSNIENMKTVLSEASSSAPSNNG 455 
55 +Q+TS ++SLT EQQ ++ +A++ + S AA I S+++ + T L S S 

Sbjct: 347 LQATSVYQSLTAEQQGELAAALSQSDKSQTVSAA.QTILSSVQTLSTSLQSLSQEDQSKQL 406 

Query: 456 SQNLQTLSGTANNLVLKAISDLDKIQKLPTATKQLYQGSQTLTKGITDYTNAV GQL 511 

Q + ++ AN Q LP A+ L + S L K V QL 

60 Sbjct: 407 EQLKEAVAQIANQ SNQALPGASSALTELSTGLAKVNGSLNQQVLPGSNQL 456 

Query: 512 RKGAVTLDSKSNQLISGTQKASQGAQTLDSKSDQLRDGAGQLASGSDRIADGSNKLAGGG 571 

G L+ + + SG K S+GA L SKS +L DG+ QL+ G+ ++ADGS++L+ GG 
Sbjct: 457 TTGLAQLNRYNTAIGSGVIKLSEGANALSSKSGELLDGSHQLSEGATKLADGSSQLSQGG 516 

65 
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Query: 572 HQLTDGLTELSGGVSQLSSSLGKAGDQLSMVS\7NKDNANAVSSPVTIKHEDYDSVDTNGV 631 

HQLT GLTELS G+S L+ SL KA QLS+VSV NA AV+ P+ + +D D V TNG+ 
Sbjct: 517 HQLTSGLTELSTGLSTIMGSlAKASQQLSIiVSVTDKNAKAVAKPLVl^KDKDGVKTNGI 576 

5 Query: 632 GMAPYMI S VALMWALSANVI FAKALSGKEPANRFSWAKNKLLINGFIATLAATILFFAV 691 

GMAPYMI+V+LMWALS NVIFA +LSG+ +++ WAK K +INGFI+T+ + +L+ A+ 
Sbjct: 577 GMAPVMIAVSLMWALSTNVIFANSLSGRPWDKJTOWAKQKFVINGFISTMGSIVLYLAI 636 

Query: 692 QFIGLKPDYPGKTYFIILLTAWTLMALVTALVGWDNRYGSFLSLLILLFQLGSSAGTYPI 751 
10 Q +G + Y +T I+L+ WT MALVTALVGWD+RYGSF SL++LL Q+GSS G+YPI 

Sbjct: 637 QLLGFFJ^YGMETLGFIMLSGWTFMALVTALVGWDDRYGSFASLVMLLLQVGSSGGSYPI 696 

Query: 752 ELSPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQWRMLVIFLVSSMlIiALLIYRKQE 810 
ELS FFQ + PFLPMTY VSGLR+TISL+G + + ++L FL++ M+LALLIYR ++ 
15 Sbjct: 697 ELSGAFFQKLHPFLPMTYWSGLRQTISLSGHIGVEVKVLTGFLLAFMVLALLIYRPKK 755 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1044 

20 A DNA sequence (GBSxlll6) was identified in S.agalactiae <SEQ ID 3229> which encodes the amino 
acid sequence <SEQ ID 3230>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0 . 2664 (Affirmative) <: suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1045 

35 A DNA sequence (GBSxlll7) was identified in S.agalactiae <SEQ ID 3231> which encodes the amino 
acid sequence <SEQ ID 3232>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.45 Transmembrane 48 - 64 ( 45 - 69) 
40 INTEGRAL Likelihood = -1.49 Transmembrane 71 - 87 ( 71 - 87) 

Final Results 

bacterial membrane Certainty=0 .4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9441> which encodes amino acid sequence <SEQ ID 9442> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

50 >GP:AAA25222 GB:M87483 ORF 1 [Lactococcus lactis] 

Identities = 50/88 (56%), Positives = 66/88 (74%), Gaps = 1/88 (1%) 

Query: 2 TGKIFSMSKEELSYLPVIKLFKNQGVYNGLIGLFLLYGLYISQNQ-EIVAVFLINVLLVA 60 
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T ++F+M KEEL V LFKNQG+YNGLIGL L+Y ++ S Q EIV + LI ++LVA 
Sbjct: 32 TSRVFNMGKEELERSSVQTLFKNQGIYNGLIGLGLIYAIFFSSAQLEIWLLLIYIILVA 91 

Query: 61 IYGALTVDKKILLKQGGLPILALLTFLF 88 
5 +YG+LT +KKI+L QGGL ILAL++ F 

Sbjct: 92 LYGSLTSNKKIILTQGGLAILALISSFF 119 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8719> and protein <SEQ ID 8720> were also identified. Analysis of this 
10 protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 4.19 
GvH: Signal Score (-7.5): -3.99 
Possible site: 38 
15 >» Seems to have an uncleavable K-term signal seq 

ALOM program count: 3 value: -9.45 threshold: 0.0 

INTEGRAL Likelihood = -9.45 Transmembrane 87 - 103 ( 84 - 108) 
INTEGRAL Likelihood = -1.49 Transmembrane 110 - 126 ( 110 - 126) 
INTEGRAL Likelihood = -0.37 Transmembrane 13 - 29 ( 13 - 29) 
20 PERIPHERAL Likelihood =0.47 65 

modified ALOM score: 2.39 

*** Reasoning Step: 3 

25 Final Results 

bacterial membrane Certainty=0 . 4779 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

30 The protein has homology with the following sequences in the databases: 

ORF00610(328 - 681 of 981) 

SP|Q02009|YTRP_LACLA(1 - 119 of 119) HYPOTHETICAL 13.3 KDA PROTEIN IN TRPE 5'REGION. 
GP|551879|gb|AAA25222.l| |M87483 ORF 1 {Lactococcus lactis} PIR | S35123 | S35123 hypothetical 
protein (trpE 5' region) - Lactococcus lactis subsp. lactis 
35 %Match =19.9 

%Identlty =58.8 %Sitnilarity =77.3 

Matches = 70 Mismatches = 26 Conservative Sub.s = 22 

114 144 174 204 234 264 294 324 

40 SPKFFQTIQPFLPMTYSVSGLRETISLTGDVNHQWRMLVIFLVSSMILALLIYRKQED**KVSSDRLTV*YGMSKYLGGE 

354 384 414 444 474 504 534 561 

DMSTLTIIIATLTALEHFYIMYLETLATQSNMTGKIFSMSKEELSYLPVIKLFKNQGVYNGLIGLFLLYGLYISQNQ-EI 

h lllh: I III I lllllll: II I I "hi llll I I I I I I h I I I I I I I h I = = I Ml 

45 MTILTIILSLLVALEFFYIMYLETFATSSKTTSRVFNMGKEELERSSVQTLFKNQGIYNGLIGLGLIYAIFFSSAQLEI 
10 20 30 40 50 60 70 

591 621 651 681 711 741 771 801 

VAVFLINVLLVAIYGALTvDKKILLKQGGLPILALLTFLF*YYLAYR 

50 | ::|| ::||hlhll =111 = 1 llll 1111 = = =1 

VRLLLIYI ILVALYGSLTSNKKI ILTQGGLAILALISSFF 
90 100 110 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
55 vaccines or diagnostics. 

Example 1046 

A DNA sequence (GBSxlll8) was identified in S.agalactiae <SEQ ID 3233> which encodes the amino 
acid sequence <SEQ ID 3234>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
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»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3140 (Affirmative) < suco 

bacterial membrane Certainty=O.O000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10285> which encodes amino acid sequence <SEQ ID 
10286> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12447 GB:Z99107 similar to arylesterase [Bacillus subtilis] 
Identities = 37/91 (40%) , Positives = 56/91 (60%) 

Query: 13 KDGSDIYYRWGQGQPIVFLHGNSLSSRYFDKQIAYFSKyYQVIVMDSRGHGKSHAKLNT 72 

+D + +YY G G PI+F+HG +S ++F KQ + S YQ I +D RGHG+S L+ 
Sbjct: 7 EDQTRLYYETHGSGTPILFIHGVLMSGQFFHKQFSVLSANYQCIRLDLRGHGESDKVLHG 66 

Query: 73 I S FRQIAVDLKDI LVHLE IDKVI LVGHSDGA 103 

+ Q A D+++ L +E+D V+L G S GA 
Sbjct: 67 HTISQYARDIREFLNAMELDHWLAGWSMGA 97 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1047 

A DNA sequence (GBSxlll9) was identified in S.agalactiae <SEQ ID 3235> which encodes the amino 
acid sequence <SEQ ID 3236>. This protein is predicted to be an integral membrane protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 58 

>>> Seems to have an uncleavable N-term signal seq 
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Final Results 

bacterial membrane Certainty=0. 6158 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10283> which encodes amino acid sequence <SEQ ID 
10284> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA24464 GB:D85082 YfiX [Bacillus subtilis] 
Identities = 190/596 (31%) , Positives = 324/596 (53%) , Gaps = 31/596 (5%) 
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Sb j Ct : 



Query: 



246 IVSLIPGGLGSFELVLFTGFAAEGLPKETV\aWLLLYRIAYYIIPFFAGIYFFIHYLGSQ 305 

++SL+PGG GSF+L+ G G +E +V ++LYRLAY IPF G++F L 
1 MISLVPGGFGSFDLLFLLGMEQLGYHQEAIVTSIVLYRLAYSFIPFILGLFFAAGDLTEN 60 



Sbjct: 



Query: 



306 INQRYENVPK ELVSTVLQTMVSHLMRILG AFLI FSTAFFENITYIMWLQKLG 357 

+R E P+ E + +L + L+RIL + ++F + + + +L 

61 TMKRLETNPRIAPAIETTNVLLWQRAVLVRILQGSLSLIVFVAGLIVLASVSLPIDRLT 120 



Sb j ct : 



Query: 



358 LDP - LQEQMLWQFPGLLLGVCFILLARTID- -QKVKNAFPIAI IWITLTLFYLNLGHI SW 414 

+ P + L F GL L ILL 1+ ++ K ++ +AI + + L ++ 

121 VIPHIPRPALLLFNGLSLSSALILLILPIELYKRTKRSYTMAITALVGGFVFSFLKGLNI 180 



Query: 



415 RLSFWFILLLLGLLVIKPTLYKKQFIYSVIEERIKDGIIIVSLMGVLFY IAGLLFPI 470 

F ++++ L+++K ++Q Y+ + I V+L V + IAG ++ 



Sbjct: 181 SAI FVLPMI I VLLVLLKKQFVREQASYTLGQLI FAVALFTVALFNYNLIAGFIWDR 236 

Query: 471 RAHITGGSIERLHYIIAWEPIAIATL 1 LTLVYLCLVKI LQGKSCQIGDVFNVDRYK 526 

4- + +++ + I AT4 1+ L +L + ++ IG+ + +R 
Sbjct: 237 MKKV LRHEYFVHSTSHITHATIMAIIIVPLFFLIFTWYHKRTKPIGEKADPERLA 292 

Query: 527 KLLQAYGGSSDSGIAFIiNDKRLYWYQKNGEDCTAFQFVIVNNKCLIMGEPAGDDTYIREA 586 

L GG++ S L FL DKR Y + +G + F + + +++G+P+G 
Sbjct: 293 AFLNEKGGNALSHLGFLGDKRFY-FSSDGNALLLFGKIA--RRLWLGDPSGQRESFPLV 349 

Query: 587 IESFIDDADKLDYDLVFYSIGQKLTLLLHEYGFDFMKVGEDALVNLETFTLKGNKYKPFR 646 

+E F+++A + 4- ++FY I ++ L H++G++F K+GE+A V+L TFTL G K R 
Sbjct: 350 LEEFMIFAHQKGFSVLFYQIEREDMALYHDFGYNFFKLGEEAYVDLNTFTLTGKKKAGLR 409 

Query: 647 NALNRVEKDGFYFEWQSPHSQELLNSLEEISNTWLEGRPEKGFSLGYFNKDYFQQAPIA 706 

NR E++ + F V PS L L++IS+ WL + EKGFSLG+F+ Y Q+APIA 
Sbjct: 410 AINNRFEREEYTFHVDHPPFSDAFLEELKQISDEWLGSKKEKGFSLGFFDPSYLQKAPIA 469 

Query: 707 LVKNAEHEWAFANIMPNYEKSIISIDLMRHDKQKIPNGVMDFLFLSLFSYYQEKGYHYF 766 

+KNAE E+VAFAN+MP Y++ IS+DLMR+ + PNG+MD LF+ +F + +E+G F 
Sbjct: 470 YMKNAEGEIVAFANVMPOTQEGEISVDL^Y-RGDAPNGII^ALFIRMFLWAKEEGCTSF 528 

Query: 767 DLGMAPLSGVGRVETSFAKERMAYLVYHFGSHFYSFNGLHKYKKKFTPLWSERYIS 822 

++GMAPL+ VG TSF ER A ++++ + YSF+GL +K+K+ P W +Y++ 
Sbjct: 529 NMGMAPLANVGTAFTS FWSERFAAVI FNNVRYMYS FSGLRAFKEKYKPEWRGKYLA 584 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8721> and protein <SEQ ID 8722> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: 9.22 
GvH: Signal Score (-7.5): -7.66 

Possible site: 58 
>>> Seems to have an uncleavable N-term signal seq 
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PERIPHERAL Likelihood = 1.06 558 
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modified ALOM score: 3.08 



*** Reasoning Step: 3 

5 Final Results 

bacterial membrane 

bacterial outside — 
bacterial cytoplasm — 



Certainty=0. 6158 (Affirmative) < suco 
Certainty=0. 0000 {Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



10 The protein has homology with the following sequences in the databases: 

ORF00608(967 - 2787 of 3141) 

OMNI |NT01BS0989 (20 - 633 of 652) putative integral membrane protein, putative 
%Match =14.6 

%Identity =33.0 %Similarity = 58.0 
15 Matches = 201 Mismatches = 244 Conservative Sub.s = 153 



825 855 885 915 945 975 1005 1035 

YyLVLIGASMYFPVIYWISGHKGSHYFGDMPSSTRIKLGWSFFEWGCAAAAFIIIGYLMGIHLPVYKILPLFCIGCa.VG 

: III! : :: :| I I 

20 LELQLLNGSWPGPVIYFALFAMGIHADIRYVFGvFvIAAIGG 

10 20 30 40 



1065 1095 1125 1155 1185 1215 1245 1260 

IVSLIPGGLGSFELVLFTGFAMGLPKETWAWLLLYRIiAYYIIPFFAGIYFFIHYLGSQINQRYENVPK ELVST 

25 ::||:|||:|||:|::: | ] :] :| ::|||||| ]|| |::| | :| | |: | : 

MISLVPGGFGSFDLLFLLGMEQLGYHQEAIVTSIVLYRLAYSFIPFILGLFFAAGDLTENTMKRLETNPRIAPAIETTNV 
60 70 80 90 100 110 120 

1290 1311 1341 1371 1398 1428 1458 1482 

30 VLQTMVSHLMRIL-GAF--LIFSTAFFENITYI^LQKLGLDP-LQEQMLWQFPGLLLGVCFILIARTID- -QKVKNAFP 

:| : |:||| | = : =:| : : : : :| : | : I Ml I :||| |: :: | :» 
LLWQRAVLTOILQGSLSLIVFVAGLIVLASVSLPIDRLWIPHIPRPALLLFNGLSLSSALILLILPIELYKRTKRSYT 
140 150 160 170 180 190 200 



35 1512 1542 1572 1602 1632 1659 1689 1719 

IAIIWITLTLFYLNLGHISWRLSFWFILLLLGLLVIKPTLYKKQFIYSWEE-RIKDGIIIVSLMGVLFYIAGLLFPIRAH 



40 



MAITALVGGFVFSFLKGLN- -ISAIFVLPMIIVIjIiV- - -LLKKQFVREQASYTLGQLIFAVALFTVALFNYNLIAGFIWD 
220 230 240 250 260 270 

1749 1779 1797 1827 1857 1887 1917 1947 

1TGGS IERLHYI IAWEPIALAT LILTL VYLCLWILO^KSCQIGDVFNVDRYKKLLQAYGGSSDSGLAFLNDKRLY 



RMKKOTiRHEYFVHSTSHITHATIMAIIIVPLFFLIFlVWHKRTKPIGEKADPERLAAFIiNEKGGNALSHLGFLGDKRFY 
45 290 300 310 320 330 340 350 

1977 2007 2037 2067 2097 2127 2157 2187 

WYQKNGEDCTAFQFVIVNNKCLIMGEPAGDDTYIREAIESFIDDADKLDYDLVFYSIGQKLTLLLHEYGFDFMKVGEDAL 
: :| : | = ': :::|:|:| :| |:::| : : :>|| I i: | |::|::| |:||:| 

50 -FSSDGNALLLF--GKIARRLVVLGDPSGQRESFPLVLEEFIjNFJffiQKGFSvLFYQIEREDMALYHDFGYNFFKLGEEAY 
370 380 390 400 410 420 430 



2217 2247 2277 2307 2337 2367 2397 2427 

VNLETFTLKGNKYKPFRNAI^WKDGFYFEWQSPHSQEl^SLEEISNTWLEGRPEKGFSLGYFNKDYFQQAPIALVK 

|:| llll II =1 II 1:= ■■ I I : I I =1 l"ll= II = 1111111=1= 1 = 1 = 1(11 : I 
VDIjNTFTLTGKKKAGLRAINNRFEREEYTFHVDHPPFSDAFLEELEQISDEWLGSKKEKGFSLGFFDPSYLQKAPIAYMK 
450 460 470 480 490 500 510 



2457 2487 2517 2547 2577 2607 2637 2667 

60 NAEHEWAFANIMPNYEKSIISIDLMRHDKQKIPNGVMDFLFIiSLFSYYQEKGYHYFDLGMAPLSGVGRVETSFAKERMA 

III 1=11111=11 !== 11=1111= = 111=11 11= =1 = =1=1 l==lllll= II III II I 

NAEGEIVAFANVMPMYQEGEISVDLMRY-RGDAPNGIMDALFIRM^^ 

530 540 550 560 570 580 590 

65 2697 2727 2757 2787 2817 2847 2877 2907 

YLVYHFGSHFYSFNGLHKYKKKFTPLWSERYISCSRSSV^ICaiCIALLMEDSKIKIW*ALFGN*KEHvMRHALFKSFNT 



WO 02/34771 



PCT/GB01/04789 



-1172- 

AVIFNNTOYMYSFSGLRAFKEKYKPEWRGKYIiAYRKNRSLSVTMFLVTO 

610 620 630 640 650 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1048 . > 

A DNA sequence (GBSxll20) was identified in S.agalactiae <SEQ ID 3237> which encodes the amino 
acid sequence <SEQ ID 3238>. This protein is predicted to be choline transporter. Analysis of this protein 
sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 .5097 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD45530 GB:AF162656 choline transporter [Streptococcus pneumoniae] 
Identities = 326/505 (64%) , Positives = 409/505 (80%) , Gaps = 1/505 (0%) 

Query: 1 MTTLITTFQERFGDWTQSLIEHLQLSLLTLILATLIAIPLGIIISHYKKISHWLQITGI 60 

MT LI TFQ+RF DW +L +HLQLSLLTL+LA L+AIPL + + +++K++ VLQI GI 
Sbjct: 1 MTNLIATFQDRFSDWLTALSQHLQLSLLTLLLAILIjAIPIjAVFLRYHEKLADWVLQIAGI 60 

Query: 61 FQTIPSLALLGLFIPFMGIGWPAWALIIYALFPILQNTVTVLMQIDANLIEAATAFGM 120 

FQTIPSLALLGLFIP MGIGT+PA+ AL+IYA+FPILQNT+T L ID NL EA AFGM 
Sbjct: 61 FQTIPSLALLGLFIPLMGIGTLPALTALVIYAIFPILQNTITGLKGIDPNLQEAGIAFGM 120 

Query: 121 TRWERLKKFELALSMPVIISGIRTASVMIIGTATLASLIGAGGLGSFILLGIDRNNPSLI 180 

TRWERLKKFE+ L+MPVI+SGIRTA+V+IIGTATLA+LIGAGGLGSFILLGIDRNN SLI 
Sbjct: 121 TRlffiRLKKFEIPLAMPVIMSGIRTAAVLIIGTATLAALIGAGGLGSFILLGIDRNNASLI 180 

Query: 181 LIGAISSAVLAIIFSGLIGLLEKARLRTIAVSGILLLAGLGLSYAPKWMPGTNTATITVA 240 

LIGA+SSAVLAI F+ L+ ++EKA+LRTI L+ LGLSY+P + + +A 

Sbjct: 181 LIGALSSAVLAIAFNFLLKVMEKAKLRTIFSGFALVALLLGLSYSPALLVQKEKENLVIA 240 

Query: 241 GKLGTEPDILINMYKELIEDQTDIKVKLKPNFGKTTFLYQALKSGDIDLYPEFTGTITSS 300 

GK+G EP+IL NMYK LIE+ T + +KPNFGKT+FLY+ALK GDID+YPEFTGT+T S 
Sbjct: 241 GKIGPEPEILANIWKLLIEENTSMTATVKPNFGKTSFLYEALKKGDIDIYPEFTGTVTES 300 

Query: 301 LLKKPPKVSNNPKQVYlflljAKNGILKQDKLSLLSPMAYQNTYAVAVKKDYAEANQLKNISD 360 

LL+ PKVS+ P+QVY +A++GI KQD L+ L PM+YQNTYAVAV K A+ LK ISD 
Sbjct: 301 LLQPSPKVSHEPEQVYQVARDGIAKQDHLAYLKPMSYQNTYAVAVPKKIAQEYGLKTISD 360 

Query: 361 LKKLD-KLKAGFTLEFKDREDGSIGLQKHYGLNLDISTLEPALRYQAINSKDVNIIDAYS 419 

LKK++ +LKAGFTLEF DREDG+ GLQ YGLNL+++T+EPALRYQAI S D+ I DAYS 
Sbjct: 361 LKKVEGQLKAGFTLEFNDREDGNKGLQSMYGIJiraSVATIEPALRYQAIQSGDIQITDAYS 420 

Query: 420 TDSELIQYQLQILKDDKHLFPPYQGAPLLRQDTIKKYPQVTOCAIiNKLAGHITEKEMQEiyiN 479 

TD+EL +Y LQ+L+DDK LFPPYQGAPL+++ +KK+P++++ LN LAG ITE +M ++N 
Sbjct: 421 TDAELERYDLQVLEDDKQLFPPYC<^PLMKEALLKKHPELERVLNTLAGKITESQMSQLN 480 



Query: 480 YQVAVKHKSAATVAKQYLKAHHIIK 504 
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YQV V+ KSA VAK++L+ ++K 
Sbjct: 481 YQVGVEGKSAKQVAKEFLQEQGLLK 505 

There is also homology to SEQ ID 636. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1049 

A DNA sequence (GBSxll21) was identified in S.agalactiae <SEQ ID 3239> which encodes the amino 
acid sequence <SEQ ID 3240>. This protein is predicted to be choline transporter (opuBA). Analysis of this 
10 protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 . 2345 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

20 >GP:AAD45529 GB:AF162655 choline transporter [Streptococcus pneumoniae] 

Identities = 139/236 (58%) , Positives = 178/236 (74%) 

Query: 1 MISFENVSKSYGDHTIIDNISCHIQRGEFFVLViGASGSGKTTILKMINRLIEPSQGAITL 60 
MI ++NV+ Y + ++ +++ 1+ GEF VLVG SGSGKTT+LKMINRL+EP+ G I + 
25 Sbjct: 1 MIEYKNVALRYTEKDVLRDVNLQIEDGEFMVLVGPSGSGKTTMLKMINRLLEPTDGNIYM 60 

Query: 61 DGENITSLDLRQLRLETGYVLQQIALFPNLTVGENIELIPEMRGWSKGDQKKAASDLLDK 120 

DG+ I D R+LRL TGYVLQ IALFPNLTV ENI LIPEMKGWSK + K +LL K 
Sbjct: 61 DGKRIKDYDERELRLSTGYVLQAIALFPNLTVAENIALIPEMKGWSKEEITKKTEELLAK 120 

30 

Query: 121 VGLPAKDYFNRYPHELSGGEQQRIGI LRAI VAKPKVLLMDEPFSALDP I SRRQLQD I TKQ 180 

VGLP +Y +R P ELSGGEQQR+GI +RA+ + +PK+ LMDEPFSALD ISR+QLQ +TK+ 
Sbjct: 121 VGLPVAEYGHRLPSELSGGEQQRVGI VRAMIGQPKIFLMDEPFSALDAISRKQLQVLTKE 180 

35 Query: 181 LQSELGITLVFVTHDMKEAMRLADRICVIKEGKIVQLDRPEIIQNNPSDQFVRTLF 236 

L E G+T +FVTHD EA++LADRI V+++G+I Q+ PE I P+ FV LF 
Sbjct: 181 LHKEFGMTTIFVTHDTDFJU^KLADRIAVIiQDGEIRQVANPETILKAPATDFVADLF 236 

There is also homology to SEQ ID 644. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1050 

A DNA sequence (GBSxll22) was identified in S.agalactiae <SEQ ID 3241> which encodes the amino 
acid sequence <SEQ ID 3242>. This protein is predicted to be two-component response regulator. Analysis 
45 of this protein sequence reveals the following: 

Possible site: 61 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.52 Transmembrane 49 - 65 ( 46 - 66) 

50 Final Results 

bacterial membrane Certainty=0. 3208 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



WO 02/34771 



-1174- 



PCT/GB01/04789 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06434 GB:AP001516 two-component response regulator [Bacillus halodurans] 
Identities = 101/305 (33%) , Positives = 152/305 (49%) , Gaps = 31/305 (10%) 

MKFYIIDDDPTITMILQDIIE-EDFNNTVTOVNNVSSKAYNELLIADVDIVLIDLLMPIL 59 
M F+I DDD 1+ II IKE V + S LI VDI+LIDLLMP 

mFFITDDDVTVRSILAQIIEDEQLGQWGEREDGSELDGKRLNIKQvDILLIDLLMPNC 60 

10 Query: 60 DGVTLVQKIYKQRSDLKFIMISQVKDNDLRQEAYKA.GIEFFINKPINIIEVKSWKRVTD 119 

DG+ +QKI K K IMISQ++ +L EAY GIE +1 KPIN IEV SV+++V + 

DGLEAIQKI-KPEFKGKIIMISQIESKELISEAYLLGIEHYIMKPINKIEVLSVIRKVIN 119 

TIEMQKKIiNTIQNLLENTPSYQKPITTSNLT KIRS ILSYLGITSETAYTDIL 171 

15 +++ L IQ L N P ++ I+S +LS LGI E+ D++ 



Query: 


1 


Sb j ct : 


1 


Query: 


60 


Sb j ct : 


61 


Query: 


120 


GV-i-i . 
&±JJ Cl~ . 


i on 


Query: 


172 


Sb j ct : 


180 


Query: 


214 


Sbjct: 


240 


Query: 


271 


Sbjct: 


300 



NI CELLLKQELNF AQFDFQKELS IDE HQQKI ILQRIRRAVKK 213 

NI L E + AD ++L+ ++ + K QR+RRAV + 



++ +4-A 1 + DF H +YA+ F F + ++ ++ + S +I++K F 



20 



25 

Query: 271 I] 

++K 
ffiAK 

30 There is homology to SEQ ID 460. 

A related GBS gene <SEQ ID 8723> and protein <SEQ ID 8724> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrira Score: -7.05 
35 GvH: Signal Score (-7.5): -6.58 

Possible site: 61 
>» Seems to have no N-terrainal signal sequence 
ALOM program count: 1 value: -5.52 threshold: 0.0 

INTEGRAL Likelihood = -5.52 Transmembrane 49 - 65 ( 46 - 66) 
40 PERIPHERAL Likelihood = 7.37 155 

modified ALOM score: 1.60 

*** Reasoning Step: 3 

45 Final Results 

bacterial membrane — Certainty=0. 3208 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the databases: 

ORF00604(307 - 1125 of 1431) 

EGAD | 137180 | 146289 (3 - 304 of 310) hypothetical protein {Bacillus cereus} 
GP|l769946|emb|CAA67094.l| |X98455 orfl {Bacillus cereus} 
%Match =12.7 
55 %Identity =34.1 %Similarity =53.0 

Matches = 95 Mismatches = 123 Conservative Sub.s = 53 



60 



168 198 228 258 288 318 348 375 

*C*W*YLSRNRAIPRAYFNGRAISRNDNCLS*SAKPJNNIYTVIP*KSI*VRR*YvKFYIIDDDPTITMILQDIIEE-DFN 

ilhlll =1 III: I'-: 

MFYYIVDDDEVFRSMLSQIIEDGDLG 
10 20 
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405 435 465 495 525 555 585 615 

NTVVRVNOTSSKAYIffiLLIADVDIVLIDLLMPIL 



EVIGESEDGAFVEAEQLNYKKA/DILFIDLLMPMRDGIETWHI-ASSPTGKIIMISQVESKQLIGEAYTLGVEYYITKPL 
40 50 60 70 80 90 100 



645 675 705 753 771 801 831 

NIIEVKSVVKRVTDTIEMQKKLNTIQNLLENTPSYQKP ITTSNLTKI RS ILSYLGI TSETAYTDILNI CELL 




120 130 140 150 160 170 



861 894 924 954 984 1014 




200 210 220 230 240 250 260 



1071 1095 1125 1155 1185 1215 1245 

FGFQNIHNE-AQL1QGKSMYGGKISL--KHFFDELILQSKTF*DLFKHGLIYYNHPKTFLFINLQQTPCLPQGVCFCF*F 




280 290 300 



SEQ ID 8724 (GBS356) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 73 (lane 3; MW 34kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 8; MW 59kDa). 

GBS356-GST was purified as shown in Figure 216, lane 7. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1051 

A DNA sequence (GBSxll23) was identified in S.agalactiae <SEQ ID 3243> which encodes the amino 
acid sequence <SEQ ID 3244>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have a cleavable N-term signal seq. 



INTEGRAL 


Likelihood 




-6 


.48 


Transmembrane 


149 - 


165 


( 


147 


- 172) 


INTEGRAL 


Likelihood 




-5. 


,20 


Transmembrane 
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53 


( 


29 


- 55) 
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Likelihood 




-2. 


.50 


Transmembrane 


126 - 


142 


( 


126 


- 142) 


INTEGRAL 


Likelihood 




-2. 


.13 


Transmembrane 


62 - 


78 


( 


60 


- 78) 


INTEGRAL 


Likelihood 




-0. 


.64 


Transmembrane 


314 - 


330 


( 


314 


- 330) 


INTEGRAL 


Likelihood 




-0 


.11 


Transmembrane 


89 - 


105 


( 


89 


- 105) 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06435 GB:AP001516 two-component sensor histidine kinase 
[Bacillus halodurans] 
Identities = 118/427 (27%) , Positives = 199/427 (45%) , Gaps = 25/427 (5%) 

Query: 10 LERRQRIIISAIAIA-LAAQINISILADGFIMTLSLFILPVFLYFNDDINPILLCLGITF 68 

L + II+S + A +A +IN + + F ++L I +FL F + 1+ 
Sbjct: 7 LSKDYMIILSMLLFAPIAGEINFYPVNETFRVSLGPPIFFLFLLFLRNTAAIVPGFFTAI 66 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 3590 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 



Query: 69 ASPIFRGI ILSIAGEAEIHQI IEFVLTDMAFYICYGITFYTIYWHRSYRNKGTFFFSI 1 1 128 

A +FR ++++ E FYY+F R+ FII 

Sbjct: 67 AVWFRVFLDTLHADFYWVDSFEIHYPTFFFYFTYSLLFSLAKVQRFHEQPLIIFLFGII 126 
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10 Query: 241 LDISRDVHEVKKDYQNI IKGLGTYFSVKNESTMALKDIFQI VLSYTRS 1 IQF 292 

L+IS +VHE+KKD Q I GL S NES + +1 QI+ R+ Q 



20 



40 



45 



Query: 


129 


Sb j ct : 


127 


Query: 


185 


Sbjct: 


185 


Query: 


241 


Sbjct: 


245 


Query: 


293 


Sbjct: 


303 


Query: 


353 


Sb j ct : 


361 


Query: 


412 


Sbjct: 


420 



A+ E F+ ++ + + + ++P I L+ S V +S + L 



+R + + S + E ++K + E+I + L +E+ + H+ + HIi 



15 ++I + + + Y L+II+N+V NAVEAID KG +++ + L ++ 



I D+GPGIPDK + +IFKPGF++KFD G GIGL++V M ++ GT+ 



G+ FT+ 



25 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1052 

A DNA sequence (GBSxll24) was identified in S.agalactiae <SEQ ID 3245> which encodes the amino 
30 acid sequence <SEQ ID 3246>. This protein is predicted to be ornithine carbamoyltransferase Otc6850 
(argF). Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.64 Transmembrane 171 - 187 ( 171 - 187) 

35 



Final Results 

bacterial membrane Certainty=0 . 1256 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB75986 GB:AJ272085 ornithine carbamoyltransferase 
[Staphylococcus aureus] 
Identities = 264/332 (79%) , Positives = 292/332 (87%) 

MKNLRNRSFLTLLDFSTAEVEFLLKLSEDLKRAKYAGIEQQKLVGKNIALIFEKDSTRTR 6 0 
MKNLRNRSFLTLLDFS EVEFLL LSEDLKRAKY G E+ L KNIAL+FEKDSTRTR 
MKNLRNRSFLTLLDFSRQEVEFLLTLSEDLKRAKYIGTEK^MLKNKNIALLFEKDSTRTR 6 0 

CAFEVAAHDQGAHVTYLGPTGSQMGKKETSKDTARVLGGMYDGIEYRGFSQETVETLAEF 120 
CAFEVAAHDQGA+VTYLGPTGSQMGKKET+KDTARVLGGMYDGIEYRGFSQ TVETLAE+ 



SGVPVWNGLTD DHPTQVLADFLTAKE L K Y DI FTYVGDGRNNVANALM GA+I+G 



M +HLVCPKEL P ELL++C+ IA G +1 IT DI +GV+ SDV+YTDVWVSMGEPD 





Query: 


1 




Sb j ct : 


1 


50 


Query: 


61 




Sbjct: 


61 




Query: 


121 


55 








Sb j ct : 


121 




Query: 


181 


60 


Sb j ct : 


181 
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Query: 241 EVWKERIALLEPyRITQEMLNMTENPNVIFEHCLPSFHNIDTKVGYDIYEPCYGLKEMEVS 300 

EVWKER+ LL+PY++ +EM++ T NPNVTFEHCLPSFHN DTK+G I +EKYG+ +EMEV+ 
Sbjct: 241 EVWKERLELLKPYQVNKEMMDKTGNPNVIFEHCLPSFHNADTKIGQQIFEKYGIREMEVT 300 

5 Query: 301 DEVFEGPHS WFQEAENRMHT I KAVMVATLGD 332 

DEVFE SWFQEAENRMHTIKAVMVATLG+ 
Sbjct: 301 DEVFESKASWFQEAENRMHTIKAVMVATLGE 332 

There is also homology to SEQ ID 31 18. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1053 

A DNA sequence (GBSxll26) was identified in S.agalactiae <SEQ ID 3247> which encodes the amino 
acid sequence <SEQ ID 3248>. This protein is predicted to be carbamate kinase (b2874). Analysis of this 
15 protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.48 Transmembrane 214 - 230 ( 214 - 230) 

20 Final Results 

bacterial membrane Certainty=0 . 1192 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA66367 GB:X97768 carbamate kinase [Clostridium perfringens] 
Identities = 162/313 (51%) , Positives = 207/313 (65%) , Gaps = 7/313 (2%) 

KIWALGGNAL GNS PEEQLRLVKHTAKSLVALI KKGHE I WSHGNGPQVGAINLG 57 

30 KIV+ALG NAL S E QL + TA S+ LI+ GHE+ + HGNGPQVG I 

KI VLALGENALQKDSKDKSAEGQLETCRQTAISVADLIEDGHEVSI VHGNGPQVGQILAS 6 1 

MNFAAESGQGTN- FPFPECGAMSQGYIGYHLQQSLLNELRQEGINKEVATI ITQIEVDES 116 
+ A + G FPF GA S+GYIGYHLQ ++ EL + Gl K V TI TQ+ VD++ 
35 Sbjct: 62 IELAHQVDNGNPLFPFDWGAFSEGYIGYHLQNTIREELLKRGIEKSVDTITTQVIVDKN 121 



40 



45 



Query: 


3 


Sbjct: 


2 


Query: 


58 


Sbjct: 


62 


Query: 


117 


Sb j ct : 


122 


Query: 


177 


Sbj ct: 


182 


Query: 


236 


Sbj ct : 


242 


Query: 


296 


Sbj ct : 


302 



D F+ PTKPIG+FY KE +EK+ +KGYT EDAGRGYRRWASP+P I+E +IKT+ 



+++ +VIA GGGGIPV+ G EG+ AVIDKD ++ LA L AD L+ILTAVD V 



F K +QKAL E+N ++ Y+ +G+FA GSMLPKV AC F+ K A+I SL 



50 AL G+ GT+I K 



There is also homology to SEQ ID 3110. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 1054 

A DNA sequence (GBSxll27) was identified in S.agalactiae <SEQ ID 3249> which encodes the amino 
acid sequence <SEQ ID 3250>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3558 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1055 

A DNA sequence (GBSxll28) was identified in S.agalactiae <SEQ ID 3251> which encodes the amino 
acid sequence <SEQ ID 3252>. This protein is predicted to be a transmembrane protein (b2298). Analysis 
of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 6243 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC22251 GB:U32741 conserved hypothetical transmembrane protein 
[Haemophilus influenzae Rd] 
Identities = 303/506 (59%) , Positives = 389/506 (75%) , Gaps = 6/506 (1%) 

Query: 10 NKRSKGFRMPGAFTILFILTIFSvIiATWWIPAGSYSKLQFDTASSKljVVTDPNGKTVHVP 69 

+K+ K F P AFTILF + I +V TW IP+GSYSKL +++ + W P 
Sbjct: 4 SKKKKTFNFPSAFTILFAILILAVGLTWVIPSGSYSKLTYNSTDNVFVVKAYGVDDKTYP 63 

Query: 70 ATQTQLDKMNVKIKIKEFTSGAISKPVSVPNTYKRLKQNPAGIGSVTTSMWGTIEAVDI 129 

AT LD +N+KIK+ FT G I KP+++P TY+R++Q+ GI +T SMV GTIEAVD+ 
Sbjct: 64 ATTDTLDNLNIKIKLSNFTEGVIKKPIAIPGTYQRVEQHHKGIEDITKSMVEGTIEAVDV 123 

Query: 130 MVFI^TOGGMIGvVRKSGAFESGLLALTKKTKGREFLLIFLVSLLMVLGGTLCGIEEEAV 189 

MVFI VLGGMIGV+ ++G+F +GL+AL KKTKG EF ++F VS+LMVLGGT CGIEEEAV 
Sbjct: 124 MVFIFVLGGMIGVINRTGSFNAGLMALVKKTKGNEFFIVFCVSVLMVLGGTTCGIEEEAV 183 



Query: 190 AFYPILVPIFLAMGYDSIICVGAIFLASSVGTSFSTINPFSSVIASNAAGISFTEGLSWR 249 
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AFYPILVP+FIA+GYD+I+CVGAIFLA+S+GT+FSTINPFS VTASNAAGI FTEG+ +R 
Sbjct: 184 AFYPILVPVFLALGYDAIVCVGAIFLRASMGTAFSTINPFSWIASNAAGIQFTEGIGFR 243 

Query: 250 TAGCIAGAIFVWYLHWYAKKIKANPEFSYSYEDRVEFNAKWGMTTN-HTPSLFTIRQKI 308 
5 G + GA V+ YL+WY KKIKA+P FSY+Y+DR EF ++ + +T F+ R+K+ 

Sbjct: 244 ALGLVLGATCVIAYLYWYCKKIKADPSFSYTYDDREEFRQRYMKNFDPNTTIPFSARRKL 303 

Query: 309 ILSLFVISFPLMVWGVMSQGWWFPTMASSFLAITIIIMFLTATGANGIGERDWDEFVNG 368 
IL+LF ISFP+M+WGVM GWWFP MA+SFLAITIIIMF+ +G+ E+D+++ F G 

10 Sbjct: 304 ILTLFCISFPIMIWGVMVGGWWFPQMAASFLAITIIIMFI SGLSEKDIMESFTEG 358 

Query: 369 ASSLVGVSLIIGLARGINIILSQGYISDTMLYTASKLASHVSGSVFIIVMMFIYFVLGFV 428 

AS LVGVSLI IGLARG+N++L QG ISDT+L S + S + GSVFI+ + ++ LG + 
Sbjct: 359 ASELVGVSLIIGLARGVNLVLEQGMISDTILDYMSNWSGMPGSVFILGQLWFIFLGLI 418 

15 

Query: 429 VPSSSGIAVLSMPIIAPIADTVGIPRSVVVMAYQFGQYAMLFLAPTGLVMATLQMLDMKY 488 

VPSSSGLAVLSMPI+APLAD+VGIPR +W AY +GQYAMLFL&PTGLV+ TLQML + + 
Sbjct: 419 VPSSSG^VLSMPIMAPIADSVGIPRDIWSAYNWGQYAMLFLAPTGLVLVTLQMLQIPF 478 

20 Query: 489 SHWLKFVWPWLFLLIFGGGLLVLQV 514 

W+KFV P++ LL+ G LLV+QV 
Sbjct: 479 DRWVKFVMPMIGCLLLIGSILLWQV 504 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3253> which encodes the amino acid 

25 sequence <SEQ ID 3254>. Analysis of this protein sequence reveals the following: 

Possible site: 36 
>» Seems to have a cleavable N-term signal seg. 

30 
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Results 





















bacterial membrane Certainty=0 . 6286 (Affirmative) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB94000 GB:AF008219 unknown [Borrelia afzelii] 
45 Identities = 174/496 (35%) , Positives = 306/496 (61%) , Gaps = 37/496 (7%) 

Query: 10 RIPSSYTVLFIIIAIMAVLTWFIPAGAYETAK---GGG VI SGTYKTVASNPQGFF 61 

++PSS+T++F +1 + +LT+ IPAG ++ G G +++GTY+T+ P+GF 

Sbjct: 3 KMPSSFTIIFSLIVFVTILTYVIPAGKFDKEFRQIGDGPKREIIVAGTYQTIDRGPRGFL 62 

50 

Query: 62 DILMAPVRGMLGVEGTDGAIQVSFFIL1WGGFLGVVNKTGALDTGIASVVRKNKGREKML 121 

+M + M +G + A +V F+L+VGG G++ KTGA+D GI S+++K ++K+L 
Sbjct: 63 HPIMTILTAMS--KGMEHAAEVIIFVLIVGGAYGIIMKTGAIDAGIYSLIKKLGHKDKLL 120 

55 Query: 122 IAILIPLFALGGTTYGMGEETMAFYPLLIPVMIAVGFDSIVAVAIILIGSQIGCLASTIN 181 

I +L+ +F++GGT GM EET+ FY ++IP+++A+G+D++V VAII +G+ +G +AST+N 
Sbjct: 121 IPLLMFIFSIGGTVTGMSEETLPFYFVMIPLIVALGyDNWGVAIIAWaGVGTMASTVN 180 

Query: 182 PFATGVAADAAGVSIADGMIWRVIQWILVGMSIWFVYNYASKIEEDPSKSLVADKEEEH 241 
60 PFATG+A+ A +S+ DG +R++ + I + ++I +V YAS+I++DPSKSLV K+ EH 

Sbjct: 181 PFATGIASAIASISLQDGFSFRIVLYFISILVAIIYVCWYASRIKKDPSKSLVYSKKNEH 240 

Query: 242 KELF-QLQNSGEDIjNKRQRNVLTIFTLTFVIMII^LIPWEDFGIKFFTNINTWLTTMPIL 300 
+ F + + S ED NV TF ++ L+ FG I + ++ L 

65 Sbjct: 241 YQYFVKNEISKED NVQNTLEFTFARKLVLLL FGFM ILFLVFSIVQL 286 
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Query: 301 GGVIGKTMGAFGTWYFPEITMLFIMMGVLVAIVYRMSEEDFFSSFLTGAGEFLGVAMICA 360 

G W+ E+TML++ + ++ A + R+ E + + +F+ G+ + A+I 

Sbjct: 287 G WWMQEMTMLYLGVAIISAFICRLGESEMWDAFVKGSESLITAALIIG 334 

5 

Query: 361 IARGIQVIMNGGMITATILHLGETSLSGLSSQVFVILAYIFYLPMSFLIPSTSGIAGATM 420 

+ARG+ ++ + G+ITAT+L+ L L F+IL I + + F++PS+SG A TM 

Sbjct: 335 LARGVMIVCDDGLITATMLNAATNFLYNLPRPFFIIIiNEIIQIFIGFIVPSSSGHASLTM 394 

10 Query: 421 GI^PLGQFSNVPAHLVITAFQSASGIL^ISPTSAIVMGALALGRVDLGTWWKFIGKFI 480 

IMft.PL F ++ V+ A Q++SG++N+I+PTS ++M L + ++ GTW+KF+ 
Sbjct: 395 PIMAPIADFLSIGRSSWIAMQTSSGLINLITPTSGVIMAVLGISKLSYGTWFKFVLPLF 454 

Query: 481 VMVMLVSVLLLWATF 496 
15 ++ +S+L+++ + 

Sbjct: 455 IIEFFISILVIIANVY 470 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 158/542 (29%) , Positives = 274/542 (50%) , Gaps = 92/542 (16%) 

Query: 11 KRSKGFRMPGAFTILFILTIFSVIATWWIPAGSYSKLQFDTASSKLVVTDPNGKTVHVPA 70 

++ +GFR+P ++T+LFI+ + TW+IPAG+Y +TA 
Sbjct: 4 EKKRGFRI PSSYTVLFI I IAIMA.VLTWFI PAGAY ETAKG 42 

25 Query: 71 TQTQLDKMNVKIKIKEFTSGAISKPVSVPNTYKRLKQNPAGIGSVTTS^^VNG TI 124 

G IS TYK +NPG + +VG T 

Sbjct: 43 GGVIS GTYKTVASNPQGFFDILMAPVRGMLGVEGTD 78 

Query: 125 EAvDIWFIMVLGGMIGVVRKSGAFESGLLALTKKTKGREFLLIFLVSLLMVLGGTLCGI 184 
30 A+ + FI+++GG +GW K+GA ++G+ ++ +K KGRE +LI ++ L LGGT G+ 

Sbjct: 79 GAIQVSFFILMVGGFLGWNKTGALDTGIASWRKNKGREKMLIAILIPLFALGGTTYGM 138 

Query: 185 EEEAVAFYPILVPIFIAMGYDSIICVGAIFIASSVGTSFSTINPFSSVIASNAAGISFTE 244 
EE +AFYP+L+P+ +A+G+DSI+ V I + S +G STINPF++ +A++AAG+S + 
35 Sbjct: 139 GEETMAFYPLLIPVMIAVGFDSIVAVAIILIGSQIGCLASTINPFATGVAADAAGVSIAD 198 

Query: 245 GLSWRTAGCIAGAI FVWYLHWYAKKI KANPK FS YS YEDRVEFNAKWGMTTNHTPSLFTI 304 

G+ WR + + +++ YA KI+ +P S D+ E + + N L 

Sbjct: 199 GMIWRVIQWVILVGMSIWFVYNYASKIEEDPSKSL-VADKEEEHKELFQLQNSGEDL-NK 256 

40 

Query: 305 RQKIILSLFVISFPLMV W GVMSQ GWWF 331 

RQ+ +L++F ++F +M+ W GV+ + W+F 

Sbjct: 257 RQRNVLTIFTLTFVIMILSLIPWEDFGIKFFTNINTWLTTMPILGGVIGKTMGAFGTWYF 316 

45 Query: 332 PTMASSFLAITIIIMFLTATGANGIGERDWDEFVNGASSLVGVSLIIGLARGINIILSQ 391 
P + F+ + +++ + + E D F+ GA +GV++I +ARGI +I++ 

Sbjct: 317 PEITMLFIMMGVLVAIVYR MSEEDFFSSFLTGAGEFLGVAMI CAIARGIQVIMNG 371 

Query: 392 GYISDTMLYTASKI^SHVSGSVFIIvMMFIYE^GEWPSSSGIAvlSMPIIAPIiADTVG 451 
50 G 1+ T+L+ S +S VF+I+ Y + F++PS+SGLA +M I+APL 

Sbjct: 372 GMITATILHLGETSLSGLSSQVFVILAYIFYLPMSFLIPSTSGLAGATMGIMAPLGQFSN 431 

Query: 452 IPRSVWMAYQFGQYAMLFIAPT-GLVMATLQMLDMKYSHWLKFVWPVVLFLLIFGGGLLVL 512 
+P +V+ A+Q + ++PT +VM L + + W KF+ ++ +++ LLV+ 

55 Sbjct: 432 VPAHLVITAFQSASGIIjNMISPTSAIVMGAIALGRvDLGTWWK^ 493 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1056 

60 A DNA sequence (GBSxll29) was identified in S.agalactiae <SEQ ID 3255> which encodes the amino 
acid sequence <SEQ ID 3256>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
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>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.83 Transmembrane 25 - 41 ( 18 - 47) 
INTEGRAL Likelihood =-10.46 Transmembrane 153 - 169 ( 148 - 176) 



Final Results 

bacterial membrane Certainty=0. 5331 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB13183 GB:Z99110 similar to two-component sensor histidine 
kinase [YkoG] [Bacillus subtilis] 
Identities = 119/446 (26%) , Positives = 212/446 (46%) , Gaps = 18/446 (4%) 



Query: 


17 


TQITLWySSFIFILVIGVLIGSFFISKSIAENKSKKNLEAKAVQMSQALAKGHRYEAFED 


76 






T+I L+ S + IL+I V + I S +K L + +++AL 




Sbj ct : 


5 


TKIHLYTSISLLILLILVHTAVYLIFSSALTSKDAARLADETDNIAEALRAAETEGVALQ 


64 


Query: 


77 


GI FYS VYDQNGKV- 1 YSGFPKGFKRDLDHQHKHKKKLSLFSMEN RTFQYVDI 


127 






+ + NGV++GK + LSSE +F + 




Sbj ct : 


65 


DMLQAYLPANGMVRWNGDQKAVMTITKEKAYKDFPLSFHSGETADVRKPDGKLFAEAAV 


124 


Query: 


128 


PISGKNQWLRAIRTVDRLDKQLTELLFSLGIVLPLMLI I ITVG GYLILKRTFRPIQ 


183 






P+ + + +++ V+RL+ E LF Ii I+L + + b L+ +R P1+ 




Sbj ct : 


125 


PVIWTDGQWSLQLVERLENT-EESLFLLKIILIAASAAVCIASFFAGSLLARRIINPIR 


183 


Query: 


184 


EITETAQFITQNEDYTKRIITKNNENELTELAAVINTMLASIESSFVREKQFNNDVSHEL 


243 






+ T+I +++++ + + +EL ++ N M ++ + +++QF D SHEL 




Sbj ct : 


184 


RLMITMKDIQRDKEFKTISLEGQSNDELYQMGLTFNEMAMMLKEHYDKQQQFVQDASHEL 


243 


Query: 


244 


RTPVTVILSESEYGKNYAENLSEA-KESFEVIHRQSLSMKKLVEQLLELTKAENPLSIQL 


302 






+TP+T+I S S K + E +ES E IH +++ MKKL QLL L K+ L + L 




Sbj ct : 


244 


KTPLTIIESYSSLMKRWGAKKPEVIBESIEAIHSEAVHMKKLTNQLLALAKSHQGLEVDL 


303 


Query: 


303 


EPLNFSIMMKQLVSDSSRLLDNTPIHLDSQIEDDLWIIGQQTLLKRLFDNLFSNAIKFTN 


362 






+ ++I +V+ + + I L++ ++ L + + +K+L L NAIK++ 




Sbj ct : 


304 


KTI DL - 1 KAARA VMQTLQS VYQRDI LLETD - KESLLVKADEERI KQLLT I LLDNAI KYSE 


361 


Query: 


363 


NHISISLRQSDNQIVFSIKDNGLGISVDDQSKIWNRFYQVDSARTKDSQSGIGLGLSLVK 


422 






I +S + + S++D G+GI + ++ RFY+ D AR + + G GLGLS+ K 




Sbj ct : 


362 


KPIEMSAGTRNGRPFLSVRDEGIGIPEEHIPHLFERFYRADEARNRKT-GGTGLGLSIAK 


420 


Query: 


423 


QIATIHRAKIWVDSKPDDGSQFTLTF 448 








QIA H ++ V SKP G+ T+ F 




Sbj ct : 


421 


QIADEHGIELSVKSKPGQGTAVTMQF 446 





There is also homology to SEQ ID 1 178. 

SEQ ID 3256 (GBS77) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 21 (lane 2; MW 78.5kDa) and in Figure 28 (lane 2; MW 78.5kDa). 

GBS77-GST was purified as shown in Figure 195, lane 4. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1057 

A DNA sequence (GBSxll30) was identified in S.agalactiae <SEQ ID 3257> which encodes the amino 
acid sequence <SEQ ID 325 8>. This protein is predicted to be CopR protein (tcrA). Analysis of this protein 
sequence reveals the following: 



Possible site: 33 
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>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 3963 (Affirmative) < suco 

5 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC07978 GB:AJ278983 CopR protein [Ralstonia metal lidurans] 
10 Identities = 102/221 (46%) , Positives = 145/221 (65%) 

Query: 1 MKILVVEDEFDLNRSIVKLLKKQHYSVDSASNGEEALQFVSVAEYDVIILDVMMPKMDGF 60 

MK+LWEDE + + L + + VD +NG + F YD+IILDVM+P +DG+ 

Sbjct: 1 MKLLVVEDEVKTGEYLRQGLTEAGFVVDLVaNGLDGQHFAVNETYDLIILDVMLPDVDGW 60 



15 



Query: 61 TFLKLLRNKGSQVSILMLTARDAVEDRIAGLDFGADDYLVKPFEFGELMARIRAMLRRAN 120 

L +R G+ V +L LTARD+V DR+ GL+ GADDYLVKPF F EL+AR+R +LRR 
Sbjct: 61 HILHAIRASGNAVPVLFLTARDSVADRVRGLELGADDYLVKPFAFSELLARVRTLLRRGA 120 



20 Query: 121 RQVSSDDIQIQDITINLSTKQVWRNDNLIDLTAKEYEVLEYLARHRDQVLSRHQIREHVW 180 

Q++ D IQ+ D+ ++LS ++ R I LT+KE+ +LE AR R +VL R I VW 
Sbjct: 121 VQLAMDRIQVADLILDLSRRRASRGGRRITLTSKEFALLELFARRRGEVLPRSLIASQVW 180 

Query: 181 DYDYYGESNIIDVLIKNLRRKLDNNRDGSLIKTKRGLGYVI 221 
25 D ++ +SN+IDV 1+ LR K+D+ + LI+T RG+GYV+ 

Sbjct: 181 DMNFDSDSNVIDVAIRRLRAKIDDGFEVKLIQTVRGMGYVL 221 

There is also homology to SEQ ID 3260. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 1058 

A DNA sequence (GBSxll31) was identified in S.agalactiae <SEQ ID 3261> which encodes the amino 
acid sequence <SEQ ID 3262>. Analysis of this protein sequence reveals the following: 

Possible site: 40 
35 >>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.45 Transmembrane 18 - 34 ( 16 - 36) 

Final Results 

bacterial membrane Certainty=0 . 2381 (Affirmative) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10281> which encodes amino acid sequence <SEQ ID 
10282> was also identified. 

45 The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3262 (GBS78) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 4; MW 23.8kDa). 

The GBS78-GST fusion product was purified (Figure 194, lane 4) and used to immunise mice. The 
50 resulting antiserum was used for FACS (Figure 317), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1059 

A DNA sequence (GBSxll32) was identified in S.agalactiae <SEQ ID 3263> which encodes the amino 
acid sequence <SEQ ID 3264>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.04 Transmembrane 15 - 31 ( 6 - 35) 
INTEGRAL Likelihood = -1.28 Transmembrane 51- 67 ( 51- 67) 



Final Results 

bacterial membrane — Certainty=0. 5416 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3264 (GBS79) was expressed in E.coli as a GST-fusion product. GBS79d was expressed in E.coli 
as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 154 (lane 17 & 18; 
20 MW 51kDa), in Figure 155 (lane 17; MW 51kDa) and in Figure 187 (lane 13; MW 51kDa). GBS79d was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
155 (lane 2-4; MW 26kDa) and in Figure 183 (lane 5; MW 26kDa). Purified GBS79d-GST is shown in 
Figure 243, lane 2. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 1060 

A DNA sequence (GBSxll33) was identified in S.agalactiae <SEQ ID 3265> which encodes the amino 
acid sequence <SEQ ID 3266>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
30 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5326 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10279> which encodes amino acid sequence <SEQ ID 
10280> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

40 >GP:AAG20974 GB:AE005164 Vng6349c [Halobacterium sp. NRC-1] 

Identities = 97/358 (27%) , Positives = 163/358 (45%) , Gaps = 20/358 (5%) 

Query: 35 DPQIIKLTTRANIAIGTYEGFLESIINPMLLISPLLSQEAVLSSKLEGTHATLKDLLNYE 94 
D + A +G G + P +L + LL +EA+ S+++EG L + E 

45 Sbjct: 70 DDDFYETLADATFWLGKLSGVSLELDFPPVLYTSLLRKEAMESAEIEGADVDYDALYSLE 129 

Query: 95 AGNKVDIERDELHEII KYRKALFYALENISTINNIDSKGLPLSNRIIKEMHKIL 148 

D RDE E + R+ L Y 1+ +D+ G L+ ++ ++H+ L 
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Sbj ct : 


130 


T-RTFDEGRDEPSETTAAAETIODTREVIJ^TAVKEGIDALDA-GEEIiNVELLHDLHETL 


187 


Query: 


149 


LDNV- - -RGSSKNPGNFKRSQNYIGSVSSISyTPVPAEKTPEYMSNLEQYIHYD-DLDLL 


204 






L V R + G++K + NY+G + P + M L Y L 




Sbj ct : 


188 


LTGVPDDRVDTDTI GDYKTNPNYLGD FLPPAPGAVEDLMDGLFTYYRTGGSYHPL 


242 


Query: 


205 


VQSAIIHAQFEMIHPFEDGNGRIGRLLIPLFLYYQELLSYPTFYMSSYFERDRSLYISHL 


264 






V A+ H QFE IHP+ DGNGR+GRLLI L LY +LL P Y+S Y R+++ Y+ + 




Sbjct: 


243 


VDIALFHYQFETIHPYGDGNGRLGRLLITLQLYDADLLERPISnjYLSEYLNRNKTTYVERM 


302 


Query: 


265 


SNISKDMSIWiraWFEYYLEGVILSAEESTKKAQDILSLYNIMKEQVIPKIiNSVSGIQLLDF 


324 






+ W+ W +++EG+ AES++++L ++ K++QL 




Sbj ct : 


303 


EGTOFHGEWFAWLSFFIEGIARQAHESVERTRAIADLRREYEHEYGGKAYTKN--QIiAVT 


360 


Query: 


325 


I FSAPI FKAEQVSEHLKI S KRTTYTLLNKL I DEG YL - STDNAQRNRTYYCPQLLS I VQ 381 






+F P ++ V I + T +N+L++EG L RN+ Y ++ I++ 




Sbj ct : 


361 


LFEQPYITSKTVQRLFDIEQSTASRAINEJjvNEGILEEVPRHGRNKEYRAREIFEILE 418 



No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1061 

A DNA sequence (GBSxll34) was identified in S.agalactiae <SEQ ID 3267> which encodes the amino 
acid sequence <SEQ ID 3268>. Analysis of this protein sequence reveals the following: 

25 Possible site: 47 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4370 (Affirmative) < suco 

30 bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

RGD motif : 46-48 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3268 (GBS299) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 58 (lane 2; MW 62.2kDa) and in Figure 60 (lane 4; MW 62.2kDa). 

GBS299-GST was purified as shown in Figure 207 (lane 4) and Figure 225 (lanes 2-3). 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1062 

A DNA sequence (GBSxll35) was identified in S.agalactiae <SEQ ID 3269> which encodes the amino 

acid sequence <SEQ ID 3270>. Analysis of this protein sequence reveals the following: 

45 Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4176 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 1063 

A DNA sequence (GBSxll36) was identified in S.agalactiae <SEQ ID 3271> which encodes the amino 
acid sequence <SEQ ID 3272>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1789 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified hi S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 1064 

A DNA sequence (GBSxll37) was identified in S.agalactiae <SEQ ID 3273> which encodes the amino 

acid sequence <SEQ ID 3274>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3748 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 1065 

A DNA sequence (GBSxll38) was identified in S.agalactiae <SEQ ID 3275> which encodes the amino 
acid sequence <SEQ ID 3276>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1638 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0.0000{Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12294 GB:Z99106 similar to transposon protein [Bacillus subtilis] 
Identities = 84/291 (28%) , Positives = 138/291 (46%) , Gaps = 6/291 (2%) 

5 Query: 6 MLDYLAOTIKGLAPDDVIEKILILPKDKFVLNEWGINKyQRHYSFSEIKTOFNKDWQSKM 65 

M+DY+ V+ K D +IE++L L KD + G Y Y IKV+++ ++ 

Sbjct: 31 MVDYIRVSFKTHDVDRIIEEVLHLSKDFMTEKQSGFYGYVGTYELDYIKVFYSAPDDNR- 89 

Query: 66 GVFIELRGQGCRQYEEYMENNVNNWVTLMKRISECHSNVTRLDIANDIFDDSLSVPLIYS 125 
10 GV IE+ GQGCRQ+E ++E W + + + TR D+A D S+P + 

Sbjct: 90 GVLIEMSGQGCRQFESFLECRKKTWYDFFQDCMQQGGSFTRFDLAIDDKKTYFSIPELLK 149 

Query: 126 YCKKQLC I STAKTFDYHEKSLLENGEKVGEMVTIGVRGTQQW - CVYNKLLEQKLDQELPN 184 
+K CIS + D++ L +G G + G + ++ + CYK EQ +P 
15 Sbjct: 150 KAQKGECISRFRKSDFNGSFDLSDGITGGTTIYFGSKKSEAYLCFYEKMYEQAEKYNIPL 209 

Query: 185 TPL - SWTRAELRCWQEKANLLAKQI KEGRPLKEI YFEVINGHYRFVSPRDKDSNRWRRKT 243 

L W R ELR E+A + + + + h 1 ++IN + RFV D++ R KT 
Sbjct: 210 EELGDWI^YELRLKIffiRAQVAIDALLKTKBLTLIAMQIINNYVRFVD-ADENITREHWKT 268 

20 

Query: 244 VKOTJNDYLETQEKTVLSVKRTKPTLKRSEKWTEKQVSRTLGKLYVAKAESH 294 

+W+D++ + L VK K ++S W + T+ V +A+ H 

Sbjct: 269 SLFWSDFIGDVGRLPLYVKPQKDFYQKSRNWLRNSCAPTM- - KMVLEADEH 317 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1066 

A DNA sequence (GBSxll39) was identified in S.agalactiae <SEQ ID 3277> which encodes the amino 
30 acid sequence <SEQ ID 3278>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm — Certainty=0 . 1914 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB70622 GB:AJ243106 integrase [Streptococcus thermophilus] 
Identities = 135/474 (28%) , Positives = 233/474 (48%) , Gaps = 68/474 (14%) 

Query: 20 KAGNVLVKFAMRFTHPITKKSHKKYLSTGASKGWFTTKATPSKKLPSGKERLIjVSDIKNT 79 
45 K G + VKF F + +T K ++ LS W+T +KK +GK +L S 

Sbjct: 19 KTGYIEVKFRTYFNNQLTNK-RREILSD WYTIV NKKDTTGKI KL--SPQIKA 67 

Query: 80 QLITQVTQEMKLvDDYIAELMGIKPKKAKKLLTLEEIAKPFDKDGNFYGKAFKAWH--- 136 
+ ++ ++ NK+ ++ ++ K +TL+E+ + WH 

50 Sbjct: 68 IIHKELQEKANKVYEELTRTIL LEKSDITLDEV WNEWHNER 108 

Query: 137 -ERVKPANNTLKTRVTIYNRYIEPNFDTRMSITKFAFMTDEIQNIjIN ASSMHMAR 190 

ER A TL Y +1 + SI K + I+NL++ + +A+ 

Sbjct: 109 VERQLVAPKTLAGEDGRYRNHITKQIP-KNSILK-NIPSSLIKNLLDNLYPIGNHKRLAQ 166 



55 



Query: 191 NLHIYLKMIFDWSVENGQITLTQDPIASNKVKRRVLTKSEEQDK-KREDIAEKYLEASEV 249 

+ L 1+ +++ + 1+ Q+P+ + R+ L S+E D+ K+ DI ++YLE+ E+ 
Sbjct: 167 GVKSDLTSIYKFAILHDYISPDQNPMPYISIGRKGL--SDELDRLKKSDIEDQYLESWEL 224 
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Query: 250 NHVLRLIESWTNRPDNQLIADVLRMIFLTGMRPSE\^GLNEDM^ 309 

VL ++ + N+ A + LTGMR EVLGL E+ +DF K V RA+ 

Sbjct: 225 KEVLSIVRKY NEQYARIFEFQALTGMRIGEVLGLKEEAIDFNJCNIASVIRTRATH 279 

Query: 310 WKSDDMMEA2^DEKERyRADLKTKESVRTIPMSPEVEKII.RHYIDRNKFQAQFSPTYQD 369 

+ + + Y ++K +S R + +S +IL+ 1+ N +F+P Y+D 

Sbjct: 280 GGASE DSYEGNVKNLQSYRNVQLSKRAIEILKEEIELNHQHIRFNPDYKD 329 

Query: 370 LGYIiFTRTYIRAGNRQGSPLYHHELSQFLRGGSSQSAKYNKKAGKPYK DIDSFLDFG 426 

G++FT I + G+PL+++ L+ FL SS++ K N+ G P + DID+ L F 
Sbjct: 330 NGWIFTSKSIHKPDYNGTPLHYSVLNNFL--NSSENGKLNRN-GNPRRAGIDIDNKLSFK 386 

Query: 427 RPIHVIPHMFRHSFISIMASEGIDLPTIREFVGHSEDSKEIERWLHVIKKQKD 480 

+ H+ H+FRH+ IS +A +G+ h I++ VGHS S+ + +YLH+ KK KD 
Sbjct: 387 K--HITTHIFRHTHISFLAEQGVPLEAIQDRVGHSRGSR-VTEIYLHITKKTKD 437 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3279> which encodes the amino acid 
sequence <SEQ ID 3280>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 5203 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 82/357 (22%) , Positives = 155/357 (42%) , Gaps = 52/357 (14%) 



Query: 135 WHERVKPANNTLKTRVTIYNRYIEPNFDTRMSITKFAFMTDEIQNLINA- - SSMHMARNL 192 

W K+T + R+ D+IK T +Q++I+ S + 
Sbjct: 73 WEHHQKSLKSTSWSLDFRIRELRNLIDPEVMIAKIT--TKyLQSIIDKIPGSYDKRKRA 130 

Query: 193 HIYLKMIFDWSVENGQITLTQDPIASNKVKRRVLTKSEEQDKKREDIAEKYLEASEVNHV 252 

LK FD+++ +++ +P+ S ++ + V T K ED+A+K+LE E+ 

Sbjct: 131 RQLLKQTFDYAIALEYVSI - -NPVISTQLAKPVKTI KDFEDVAQKFLEKDELK- - 181 

Query: 253 LRLIESWTNRPDNQLIADVLRMIFLTGMRPSEVLGIjNEDMLDFEKKWIKVHWQRASKNKS 312 

RL++ R + +A+ +LGR EL + D + + I++H 
Sbjct: 182 -RLLDEMYRRKGSIKMAYLAEFMSLNGCRIGEALAIQPD--NIKNDIIEIH 229 

Query: 313 DDMMFALNLDEKERYRADLKTKESVRTIPMSPEVEKILRHYIDRNKFQAQFSPTYQDLGY 372 

++ + + + KT S R ++ ++I++ + N + +P Y+D+GY 
Sbjct: 230 -GTLDYTSNGYRNAIKTTPKTNSSWRETLITKREKEIIQDILKINALEKNTNPNYKDMGY 288 

Query: 373 LFTRTYIRAGNRQGSPLYHNELSQFLRGGSSQSAKYNKKAGKPYKDIDSFLDFGRPIHVI 432 

+F +R G P+ N L+ +R NK+ KP + + 
Sbjct: 289 IFI SRNGVPIQDNALNTSIRAA NKRLEKPIQK ELT 323 

Query: 433 PHMFRHSFISIMASEGIDLPTIREFVGHSEDSKEIERVYLHVIKKQKDTMRGAVEKL 489 

H+FRH+ +S +A + L TI + VGH+ DSK +++Y HV K K+ + + +L 
Sbjct: 324 SHIFRHTLVSRIAENKVPLKTIMDRVGHA-DSKTTO^IYTHVTKSMKNEVVDIIjNRL 379 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1067 

A DNA sequence (GBSxll40) was identified in S.agalactiae <SEQ ID 328 1> which encodes the amino 
acid sequence <SEQ ID 3282>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3 023 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10277> which encodes amino acid sequence <SEQ ID 
10278> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:AAB64982 GB:U43834 Ydr540cp [Saccharomyces cerevisiae] 

Identities = 88/170 (51%) , Positives = 117/170 (68%) , Gaps = 3/170 (1%) 

Query: 36 ^TYSDKNELKEEVLKSYKKYIAEFNDIPEKLKDLRIDEVDRTPAENLAYQVGWTTLILK 95 
MR Y+ K ELKEE+ K Y+KY AEF I E KD +++ VDRTP+ENL+YQ+GW L+L+ 
15 Sbjct: 1 MREYTSKKELKEEIEKKYEKYDAEFETISESQKDEKVETVDRTPSENLSYQLGWVNLLLE 60 

Query: 96 WESDEQSGLEVKTPTETFKWNQLGELYQHFTETYASLTIKELTAQLNDNVDAIGNMIDSM 155 

WE+ E +G V+TP +KWN LG LYQ F + Y +IKE A+L + V+ + I ++ 
Sbjct: 61 WEAKEIAGYNVETPAPGYKWNNLGGLYQSFYKKYGIYSIKEQRAKLREAVNEVYKWISTL 120 

20 

Query: 156 SDEVIjFKPHMRNWADSATKNAVWEvYKFIHINTVAPFGTFRTKIRKWKKV 205 

SD+ LF+ R W AT A+W VYK+ IHINTVAPF FR KIRKWK++ 
Sbjct: 121 SDDELFQAGNRKW- - -ATTKAMWPVYKWIHINTVAPFTNFRGKIRKWKRL 167 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1068 

A DNA sequence (GBSxll41) was identified in S.agalactiae <SEQ ID 3283> which encodes the amino 
30 acid sequence <SEQ ID 3284>. This protein is predicted to be 50S ribosomal protein subunit L33-related 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 

35 Final Results -. 

bacterial cytoplasm Certainty=0 . 5420 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB66692 GB:U89998 50S ribosomal protein subunit L33 
[Lactococcus lactis subsp. cremoris] 
Identities = 43/49 (87%) , Positives = 46/49 (93%) 

45 Query: 1 MRVNITLEHKESGERLYLTSKNKRNTPDRLQLKKYSPKLRKHWFTEVK 49 

MRVNITLEHKESGERLYLT KNKRNTPD+L+LKKYS KLRKHV+F EVK 
Sbjct: 1 MRVNITLEHKESGERLYLTQKNKRNTPDKLELKKYSKKLRKHVIFKEVK 49 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3285> which encodes the amino acid 
50 sequence <SEQ ID 3286>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 5394 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 48/49 (97%) , Positives = 48/49 (97%) 

Query: 1 MRVNITLEHKESGERLYLTSKNKKNTPDRLQLKKYSPKLRKHWFTEVK 49 

MRVNITLEHKESGERLYLTSKNKRNTPDRLQLKKYSPKLRKHV FTEVK 
Sbjct: 1 MRVNITLEHKESGERLYLTSKNKROTPDRIiQLKKYSPKLRKHVTFTEVK 49 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1069 

A DNA sequence (GBSxll42) was identified in S.agalactiae <SEQ ID 3287> which encodes the amino 
15 acid sequence <SEQ ID 3288>. This protein is predicted to be 50S ribosomal protein subunit L32-related 
protein. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 3577 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB66691 GB:U89998 50S ribosomal protein subunit L32 
[Lactococcus lactis subsp. cremoris] 
Identities = 44/53 (83%) , Positives = 48/53 (90%) 

30 Query: 1 MAKPARHTSKAKRNKRRTHYKLTAPSVQFDETTGDYSRSHRVSLKGYYKGRKI 53 

MA PARHTS AK+N+RRTHYKLTAP+V FDETTGDY SHRVSLKGYYKGRK+ 
Sbjct: 1 MAVPARHTSSAKKNRRRTHYKLTAPTVTFDETTGDYRHSHRVSLKGYYKGRKV 53 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3289> which encodes the amino acid 
35 sequence <SEQ ID 3290>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

>>> Seems to have no N-terminal signal sequence 

. Final Results 

40 bacterial cytoplasm Certainty=0. 5148 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 38/39 (97%) , Positives = 39/39 (99%) 

Query: 22 LTAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 60 

+TAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 
Sbjct: 1 MTAPSVQFDETTGDYSRSHRVSLKGYYKGRKIAKANEAK 39 



50 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1070 

A DNA sequence (GBSxll44) was identified in S.agalactiae <SEQ ID 3291> which encodes the amino 
acid sequence <SEQ ID 3292>. This protein is predicted to be histidyl-tRNA synthetase (hisS). Analysis of 
this protein sequence reveals the following: 

5 Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4357 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10275> which encodes amino acid sequence <SEQ ID 
10276> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA78919 GB:Z17214 histidine- -tRNA ligase [Streptococcus 
equisimilis] 

Identities = 327/404 (80%) , Positives = 361/404 (88%) 

20 Query: 32 WQYVE1WIRNLFKQYHYDEIRTPMFEHYEVISRSVGDTTDIVTKEMYDFHDKGDRHITLR 91 

WQYVE V R FKQYHY «IRTPMFEHYEVISRSVGDTTDIVTKEMYDF+DKGDRHITLR 
Sbjct: 1 WQYVEGVARETFKQYHYGEIRTPMFEHYEVISRSVGDTTDIVTKEMYDFYDKGDRHITLR 60 

Query: 92 PEGTAPWRSYVENKLFAPEVQKPTKMYYIGSMFRYERPQAGRLREFHQVGVECFGSNNP 151 
25 PEGTAPWRSYVENKLFAPEVQKP K+YYIGSMFRYERPQAGRLREFHQ+GVECFGS NP 

Sbjct: 61 PEGTAPVvRSYVENKLFAPEvQKPVKLYYIGSMFRYERPQftGRIiREFHQIGVECFGSANP 120 

Query: 152 ATDVETIAMGHHLFEDLGIKNVKLHIJlISI^PESRQAYRQALIDYLTPIREQLSKDSQRR 211 
ATDVETIAM +HLFE LGIK V LHLNSLGN SR AYRQALIDYL+P+R+ LSKDSQRR 
30 Sbjct: 121 ATDWTIAI^YHLFERLGIKGVTLHLNSLGNRASRAAYRQALIDYLSPMRDTLSKDSQRR 180 

Query: 212 LNENPLRVLDSKEPEDKLAVENAPS ILDYLDESSQAHFDAVCHMLDAIiNI PYI IDTNMVR 271 

L+ENPLRVLDSKE EDK+AV NAPSILDY DE SQAHFDAV ML+AL IPY+IDTNMVR 
Sbjct: 181 LDENPLRVLDSKEKEDKIAVANAPSILDYQDEESQAHFDATOSMLEAIAIPWIDTNMVR 240 



35 



Query: 272 GLDYYNHTI FEF I TE IEDNELT I CAGGRYDGLVSYFGGPETPAFGFGLGLERLLLI LDKQ 331 

GLDYYNHTIFEFITE++ +ELTICAGGRYDGLV YFGGP TP FGFGLGLERLLLILDKQ 
Sbjct: 241 GLDYYNHTIFEFITEVDQSELTICAGGRYDGLVEYFGGPATPGFGFGLGLERLLLILDKQ 300 



40 Query: 332 GISLPIENTIDLYIAVLGSEANLAALDLAQSIRHQGFKVERDYLGRKIKAQFKSADTFNA 391 

G+ LP+E +D+YIAVLG++AN+AAL L Q+IR QGF VERDYLGRKIKAQFKSADTF A 
Sbjct: 301 GVELPVEEGLDVYIAVLGADAWAAIALTQAIRRQGFTVERDYLGRKIKAQFKSADTFKA 360 

Query: 392 KVIMTLGS SEVDSKEVGLKNNQTRQE VKVS FENI KTDFS S VLKQ 435 
45 KV++TLG SE+ + + LK+NQTRQE+ VSF+ I+TDF+S+ + 

Sbjct: 361 KWITLGESE I KAGQAVLKHNQTRQEMTVSFDQI QTDFAS I FAE 404 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3293> which encodes the amino acid 
sequence <SEQ ID 3294>. Analysis of this protein sequence reveals the following: 

50 Possible site: 27 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3183 (Affirmative) < suco 

55 bacterial membrane Certainty^ 0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 339/424 (79%) , Positives = 387/424 (90%) 
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Query: 13 MKLQKPKGTQDILPGESAKWQyVENVIRNLFKQYHYDEIRTPMFEHYEVISRSVGDTTDI 72 

MKLQKPKGTQDILPG++AKWQYVE+V R+ F QY+Y EIRTPMFEHYEVISRSVGDTTDI 
Sbjct: 1 MKLQKPKGTQDILPGDAAKWQYVESVARDTFSQYNYGEIRTPMFEHYEVISRSVGDTTDI 60 

5 

Query: 73 VTKEMYDFHDKGDRHITLRPEGTAPWRSYVENKLFAPEVQKPTKMYYIGSMFRYERPQA 132 

VTKEMYDF+DKGDRHITLRPEGTAPWRSYVENKLFAPEVQKP K+YYIGSMFRYERPQA 
Sbjct: 61 VTKEMYDFYDKGDRHITLRPEGTAPWRSYVENKLFAPEVQKPVKLYYIGSMFRYERPQA 120 

10 Query: 133 GRLREFHQVGVECFGSNNPATDVETIAMGHHLFEDLGIKNVKLHLNSLGNPESRQAYRQA 192 

GRLREFHQ+GVECFG+ NPATDVETIAM +HLFE LGIK+V LHI1NSLG+ PESR AYRQA 
Sbjct: 121 GRLREFHQIGVECFGAANPATDVETIAMAYHLFEKLGIKDVTLHLNSLGSPESRAAYRQA 180 

Query: 193 LIDYLTPIREQLSKDSQRRLNENPLRVLDSKEPEDKLAVENAPSILDYLDESSQAHFDAV 252 
15 LIDYLTP+R+QLSKDSQRRL+ENPLRVLDSKE EDKLAVE APSILDYLDE SQAHF+AV 

Sbjct: 181 LIDYLTPMRDQLSKDSQRRLDENPLRVLDSKEKEDKLAVEKAPSILDYLDEESQAHFEAV 240 

Query: 253 CHMLDALNIPYIIDTNMVRGLDYYNHTIFEFITEIEDNELTICAGGRYDGLVSYFGGPET 312 
ML+AL+IPY+IDTNMVRGLDYY+HTIFEFIT +E ++LTICAGGRYD LV YFGGPET 
20 Sbjct: 241 KDMLEALDIPYVIDTNMVRGLDYYSHTIFEFITSVEGSDLTICAGGRYDSLVGYFGGPET 300 

Query: 313 PAFGFGLGLERLLLILDKQGISLPIENTIDLYIAVLGSEANLAALDLAQSIRHQGFKVER 372 

P FGFGLGLERLL+I++KQGI+LPIE +D+Y+AVLG AN AL+L Q+IR QGF ER 
Sbjct: 301 PGFGFGLGLERLLMIIEKQGITLPIETEMDIYLAVLGDGANSKALELVQAIRRQGFTAER 360 

25 

Query: 373 DYLGRKIKAQFKSADTFNAKVIMTLGSSEVDSKEVGLKNNQTRQEVKVSFENIKTDFSSV 432 

DYLGRKI KAQFKSADTF AK++MTLG SEV++ + +KNN++RQEV+VSFE++ T+F+++ 
Sbjct: 361 DYLGRKIKAQFKSADTFKAKLVMTLGESEVE2«3KAVIKNNRSRQEVEVSFEDMMTNFANI 420 

30 Query: 433 LKQL 436 

+QL 

Sbjct: 421 SEQL 424 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 1071 

A DNA sequence (GBSxll45) was identified in S.agalactiae <SEQ ID 3295> which encodes the amino 
acid sequence <SEQ ID 3296>. This protein is predicted to be aspartyl-tRNA synthetase (aspS). Analysis of 
this protein sequence reveals the following: 

40 Possible site: 29 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5124 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10273> which encodes amino acid sequence <SEQ ID 
10274> was also identified. 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14714 GB:Z99118 aspartyl-tRNA synthetase [Bacillus subtilis] 
Identities = 339/585 (57%) , Positives = 432/585 (72%) , Gaps = 9/585 (1%) 

Query: 20 RSMYAGRVRSEHIGTSITLKGWVGRRRDLGGLIFIDLRDREGIMQLVINPEEVSASVMAT 79 
55 R+ Y G + + IG S+TLKGWV +RRDLGGLIFIDLRDR GI+Q+V NP+ VS +A 

Sbjct: 4 RTYYCGDITEKAIGESVTLKGWQKRRDLGGLIFIDLRDRTGIVQVVFNPD-VSKEALAI 62 



Query: 80 AESLRSEFVIEVSGWTAREQA- -NDNLPTGEVELKVQELSILNTSKTTPFEIKDGIE-A 136 
AE +R+E+V+++ G V ARE+ N NL TG +E+ +++I1N +KT PF I D E 
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Sbjct: 63 AEGIRNEYVLDIQGKWAREEGTWPNLKTGAIEIHADGVNVLNAAKTPPFAISDQAEEV 122 

Query: 137 OT)DTRMRYRYLDLRRPEMLENFKLRAKVTHSIRNYLDNLEFIDVETPMLTKSTPEGARDY 196 

++D R+++RYLDLRRP M + +LR VT ++R++LD F+D+ETP+LT STPEGARDY 
Sbjct: 123 SEDVRLKHRYLDLRRPAMFQTMQliRHNVTKaVRSFLDENGFLDIETPILTGSTPEGARDY 182 

Query: 197 LVPSRVNQGHFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDLET 256 

LVPSRV++G FYALPQSPQ+ KQLLM +G +RYYQI +CFRDEDLR DRQPEFTQ+D+E 
Sbjct: 183 LVPSRVHEGEFYALPQSPQLFKQLLMVSGIERYYQIARCFRDEDLRADRQPEFTQIDIEM 242 

Query: 257 SFLSDQEIQDIVEGMIAKVMKDTKGLEVSLPFPRMAYDDAMNNYGSDKPDTRFDMLLQDL 316 

SF+S ++I + E M+AKVM++TKG E+ LP PRM YD+AMN YGSDKPDTRFDMLL D+ 
Sbjct: 243 SFMSQEDIMSLAEEMMAKVMRETKGEELQLPLPRMTYDEAMNKYGSDKPDTRFDMIiLTDV 302 

15 Query: 317 TEIVKEVDFKVFSEA SWKA.IWKDKADKYSRKNIDKLTEIAKQYGAKGLAWLKYA 372 

++IVK+ +FKVFS A WKAI VK A YSRK+ID L A YGAKGIAW+K 

Sbjct: 303 SDIVKDTEFKVFSSAVANGGWKAimnCGGAGDYSRKDIDALGAFAANYGAKGIAWVKVE 362 

Query: 373 DNTISGPVAKFL-TAIEGRLTEALQLENNDLILFVADSLEVANETLGALRTR1AKELELI 431 
20 + + GP+AKF + +L EAL DL+LF AD EV +LGALR ++ KE LI 

Sbjct: 363 ADGVKGPIAKFFDEEKQSKLIEALDAAEGDLLLFGADQFEWAASLGALRLKLGKERGLI 422 

Query: 432 DYSKFNFLVWVDWPMFEWSEEEGRYMSAHHPFTLPTAETAHELEGDIiAKVRAVAYDIVLN 491 
D FNFLWV+DWP+ E EEGR+ +AHHPFT+P E +E ++A AYD+VLN 

25 Sbjct: 423 DEKLFNFLWVIDWPLLEHDPEEGRFYAAHHPFTMPVREDLELIETAPEDMKAQAYDLVLN 482 

Query: 492 GYELGGGSLRINQKDTQERMFKALGFSAESAQEQFGFLLEAMDYGFPPHGGLAIGLDRFV 551 

GYELGGGS+RI +KD QE+MF LGFS E A EQFGFLLEA +YG PPHGG+A+GLDR V 
Sbjct: 483 GYELGGGSIRIFEKDIQEKMFALLGFSPEEAAEQFGFLLEAFEYGAPPHGGIALGLDRLV 542 
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Query: 552 MLLftGKUNIREVIAFPKNNKASDPMTQAPSLVSEQQLEELSLTVE 596 

MLLAG+ N+R+ IAFPK AS MT+AP VS+ QL+EL L+++ 
Sbjct: 543 MLLAGRTNLRDTIAFPKTASASCLMTEAPGEVSDAQLDELHLSIK 587 

35 A related DNA sequence was identified in S.pyogenes <SEQ ID 3297> which encodes the amino acid 
sequence <SEQ ID 3298>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have an uncleavable N-term signal seq 

40 Final Results 

bacterial membrane Certaxnty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 An alignment of the GAS and GBS proteins is shown below. 

Identities = 495/582 (85%) , Positives = 538/582 (92%) 

Query: 18 MKRSMYAGRVRSEHIGTSITLKGWVGRRRDLGGLIFIDLRDREGIMQLVINPEEVSASVM 77 
MKRSMYAGRVR EHIGT+ITLKGWV RRRDLGGLIFIDLRDREG+MQLVINPEEVS+ VM 
50 Sbjct: 18 MKRSMYAGRVREEHIGTTITLKGWVSRRRDLGGLIFIDLRDREGVMQLVINPEEVSSDVM 77 

Query: 78 ATAESLRSEWIEVSGVVTAREQANDNLPTGEvELKVQELSILNTSKTTPFEIKDGIEAN 137 

ATAE LRSE+VIEV G V AR+QAND L TG VELKV L+ILNT+KTTPFEIKD +E + 
Sbjct: 78 ATAERLRSEYVIEVEGFVFARQQANDKLATGMVELKVSALTILNTAKTTPFEIKDDVEVS 137 

55 

Query: 138 DDTRMRYRYLDLRRPEMLENFKLRAKVTHSIRNYLDNLEFIDVETPMLTKSTPEGARDYL 197 

DDTR+RYRYLDLRRPEMLENFKLRAKVTHS I RNYLD+LEF IDVETPMLTKSTPEGARDYL 
Sbjct: 138 DDTRLRYRYLDLRRPEMLENFKLRAKVTHSIRNYLDDLEFIDVETPMLTKSTPEGARDYL 197 

60 Query: 198 VPSRVNQGHFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDLETS 257 

VPSRV+QGHFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDLETS 
Sbjct: 198 VPSRVSQGHFYALPQSPQITKQLLMNAGFDRYYQIVKCFRDEDLRGDRQPEFTQVDLETS 257 



65 



Query: 



258 FLSDQErQDIVEGMIAKVMKDTKGLEVSLPFPRMAYDDAMNNYGSDKPDTRFDMLLQDLT 317 
FLS+QEIQDIVEGMIAKVMK+TK ++V+LPFPRM+YD AMN+YGSDKPDTRF+MLLQDLT 
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Sbjct: 258 FLSEQEIQDIVEGMIAKVMKETKEIDVTIiPFPRMSYDVAMNSYGSDKPDTRFEMLLQDLT 317 



Query: 318 EIVKEVDFKVFSEASVVKAIWKDKADKYSRKMIDKLTEIAKQYGAKGLAWLKYADNTIS 377 

VK DFKVFSEA VKAIWK AD+YSRK+IDKLTE AKQ+GAKGLAW+K D ++ 
Sbjct: 318 VTVKGOTJFKVFSEAPAVKAIVVKGNADRYSRKDIDKIiTEFAKQFGAKGl^WVKVTDGQLft. 377 

Query: 378 GPVAKFLTAIEGRLTEALQLENNDLILFVADSLEVANETLGALRTRIAKELELIDYSKFN 437 

GPVAKFLTAIE L+ L+L NDL+LFVAD+LEVAN TLGALR RIAK+L++ID S+FN 
Sbjct: 378 GPVAKFLTAIETELSSQLKLAENDLVLFVADTLETTONOTLGALRNRIAKDLDMIDQSQFN 437 

Query: 438 FLPi/VVDWPMFEWSEEEGRYMSAHHPFTLPTAETAHEIjEGDIAKVRAVAYDIVLNGYELGG 497 

FLWVVDWPMFEWSEEEGRYMSAHHPFTLPT E+AHELEGDLAKVRA+AYD I VLNGYELGG 
Sbjct: 438 FLWVVDWPMFEWSEEEGRYMSAHHPFTLPTPESAHELEGDLAKVRAIAYDIVLMGYELGG 497 

Query: 498 GSLRINQKDTQERMFKALGFSAESAQEQFGFLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 557 

GSLRINQK+ QERMFKALGF+A-i- A +QFGFLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 
Sbjct: 498 GSLRINQKEMQERMFKALGFTADEANDQFGFLLEAMDYGFPPHGGLAIGLDRFVMLLAGK 557 

Query: 558 DNIREVIAFPKNNKASDPMTQAPSLVSEQQLEELSLTVESYE 599 

DNIREVIAFPKNNKASDPMTQAPSLVSE QLEELSL +ES++ 
Sbjct: 558 DNIREVIAFPKNNKASDPMTQAPSLVSENQLEELSLQIESHD 599 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1072 

A DNA sequence (GBSxll46) was identified in S.agalactiae <SEQ ID 3299> which encodes the amino 
acid sequence <SEQ ID 3300>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 .4376 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12952 GB:Z99109 alternate gene name: yuxA-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 104/275 (37%), Positives = 181/275 (65%), Gaps = 1/275 (0%) 

Query: 39 EKISASLLYGILSSVAVNFFFQPGHVYSSGATGLAQVISAVSKHWFSFEIPVALAFYAIN 98 

+K+ ++ +L++ +N F P VY+SG TG+AQ++S+V + F I + +N 

Sbjct: 7 KKLLIVIIGALUJAAGLNLFLIPADVYASGFTGVAQLLSSVVDQYAPFYISTGTLLFLLN 66 

Query: 99 IPLLILSWRKIGHKFTIFTFITVTVSSIFIQLMPQITLTTDPLINAIFGGLIMGAGVGFS 158 

IP+ IL W K+G FT+++ ++V ++++F+ ++P+ +L+ D L+NA+FGG+I G+G + 
Sbjct: 67 IPVGILGWLKVGKSFTVYSILSVALTTLFMGILPETSLSHDILLNAVFGGVISAVGIGLT 126 

Query: 159 FKSRISSGGTDIISLTIRKKTGRDVGSISFIINGIILLFAGLLFGWKYALYSMVTIFVSS 218 

K S+GG DI+++ + K + VG+ FI+NGII+L AGLL GW+ ALY++VT++V++ 
Sbjct: 127 LKYGASTGGLDIVAIWIAKWKDKPVGTYFFIMGIIILTAGLLQGWEKALYTLOTLYVTT 186 

Query: 219 RVTDAIFTKQKKMQAMIVTSKPYCTIKRIHRDIiHRGvTCINDAEGTYNHEKKAVLITILT 278 

RV DAI T+ K+ AMIVT K + + 1+ + RG+T + A+G + +E+K ++I ++T 
Sbjct: 187 RVIDAIHTRHMKLTAMIvTKKADEIKEAIYGK^IVRGITTV-PAKGAFTNEQKE^MIIVIT 245 



Query: 279 REEFSDFKYLMLKADPKAFVSVAENVHIIGRFVDD 313 



WO 02/34771 



PCT/GB01/04789 



-1194- 

R E D + ++ + DPKAF ++ + IGF D 
Sbjct: 246 RYELYDLEKIVKEVDPKAFTNIVQTTGIFGFFRKD 280 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3301> which encodes the amino acid 
5 sequence <SEQ ID 3302>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
>>> Seems to have no N-terminal signal sequence 



10 
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Final Results 

15 bacterial membrane Certainty=0. 3187 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

20 >GP:CAA66894 GB-.X98238 orf2 [Lactobacillus sakei] 

Identities = 105/280 (37%) , Positives = 180/280 (63%) , Gaps = 7/280 (2%) 

Query: 37 AEKISASLLYGILSSIAVNFFFQPGHVYSSGATGLAQVFSAL-SHRLLGYDFPIAFAFYL 95 
+++YG L++++VN F P YSSG TG+AQ+ +AL SH LG +A ++ 
25 Sbjct: 8 SKRIVIAMVYGFLAAVSVNLFLIPAKTYSSGVTGVAQLLTALVSH--LGGSLSVAALVFI 65 

Query: 96 INIPLLILAWYKIGHQFTIFTFITVSMSSFFIQIMPQVT--LTTDPLINAIFGGLVMGMG 153 

+N+PLL+LAW+KI HQ+ IF+ + V S F++I+P + T+ A+FGG ++G+G 

Sbjct: 66 IiNVPLLvLAWFKINHQYAIFSIVAVFTSVIFLKIIPVPVQPILTERFAGALFGGALIGLG 125 

30 

Query: 154 IGTGLKSRISSGGTDIVSLTLRKRTGKDVGSLSLMVNGAIIiAFAGILFGWQYALYSMVSI 213 

+G ++ S+GGTD++ + + TGK VG+++ ++NG 1+ AGI FGW ALYS+V I 
Sbjct: 126 VGLCFRAGFSTGGTDVIVTLVGRLTGKRVGAVNIWINGMIILAAGIFFGWGAALYSIvEI 185 

35 . Query: 214 FVS SR VTDAI FTKQKKMQATI VTSHPERVI HM I HKRLHRG VTS INDAEGTYKHEQKAVLI 273 

FVSS + D I+T+Q+K+ TIT PE+ + + +HGT + D GY +++ +V++ 
Sbjct: 186 FVSSLLMDYI YTQQQKVTVTI FTKQPEALKKRMREFIH - GATEL - DGTGLYTNQETS VIM 243 

Query: 274 TILTCEEYPEFKWLMLKTDPQAFVSVAENVRIIGRFVEDD 313 
40 T+++ + K ++ DP AFV++ + + GRF ++ 

Sbjct: 244 TWSKYDLTALKLWQDADPNAFVNIQSTMNLWGRFESNE 283 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 239/311 (76%) , Positives = 274/311 (87%) 

45 

Query: 4 RRTPLEKKVKYI I SVWAKKFGLLHTLKSI SREKYAEKISASLLYGI LSSVAVNFFFQPGH 63 

++T +KKVKY+IS AKK GLLH L+SISREKYAEKISASLLYGILSS+AVNFFFQPGH 
Sbjct: 3 KKTTYKKKVKYVISRGAKKVGLLHALRSISREKYAEKISASLLYGILSSIAVNFFFQPGH 62 

50 Query: 64 VYSSGATGLAQVISAVSKHWFSFEIPVAIAFYAINIPLLILSTOKIGHKFTIFTFITVTV 123 

VYSSGATGLAQV SA+S ++ P+A AFY INIPLLIL+W KIGH+FTIFTFITV++ 

Sbjct: 63 VYSSGATGLAQVFSALSHRLLGYDFPIAFAFYLINIPLLILAWYKIGHQFTIFTFITVSM 122 

Query: 124 SSIFIQLMPQITLTTDPLINAIFGGLIMGAGVGFSFKSRISSGGTDIISLTIRKKTGRDV 183 
55 SS FIQ+MPQ+TLTTDPLINAIFGGL+MG G+G KSRISSGGTDI+SLT+RK+TG+DV 

Sb j ct : 123 SSFFIQIMPQVTLTTDPLINAIFGGLVMGMGIGTGLKSRISSGGTDIVSLTLRKRTGKDV 182 

Query: 184 GSISFIINGIILLFAGLLFGWKYALYSMvTIFVSSRVTDAIFTKQKKMQAMIVTSKPYCV 243 
GS+S ++NG IL FAG+LFGW+YALYSMV+IFVSSRVTDAIFTKQKKMQA IVTS P V 
60 Sbjct: 183 GSLSLMWGAIl^FAGILFGWQYALYSMVSIFVSSRVTDAIFTKQKKMQATIVTSHPERV 242 

Query: 244 IKRIHRDLHRGVTCINDAEGTYmEKKAVLITILTREEFSDFKYLMLKADPKAFVSVAEN 303 

I IH+ LHRGVT INDAEGTY HE+KAVLITILT EE+ +FK+LMLK DP+AFVSVAEN 
Sbjct: 243 IHMIHKRLHRGVTSINDAEGTYKHEQKAvLITILTCEEYPEFKWLMLKTDPQAFVSVAEN 302 
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Query: 304 VHIIGRFVDDD 314 

V IIGRFV+DD 
Sbjct: 303 VRIIGRFVEDD 313 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1073 

A DNA sequence (GBSxl 147) was identified in S.agalactiae <SEQ ID 3303> which encodes the amino 
acid sequence <SEQ ID 3304>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -3.72 Transmembrane 156 - 172 ( 156 - 174) 

INTEGRAL Likelihood = -3.03 Transmembrane 112 - 128 ( 110 - 129) 

INTEGRAL Likelihood = -2.34 Transmembrane 80 - 96 ( 79 - 96) 

INTEGRAL Likelihood = -1.49 Transmembrane 60- 76 ( 58- 76) 



Final Results 

bacterial membrane Certainty=0 . 2487 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB05397 GB:AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 113/278 (40%) , Positives = 192/278 (68%) , Gaps = 1/278 (0%) 



Query: 


7 


KTKI KETI LI AFGVALYTFGFVKFNMANHLAEGGI SGVTLI IHALPGVNPALSSLLLNI P 


66 






+ K K +1 G A+++FG V FNM N+LAEGG +G+TLI++ +F +NPA+++L+LNIP 




Sb j ct : 


4 


RLKWKNIVFILLGSAIFSFGLVYFNMENNIAEGGFTGIT^^ 


63 


Query: 


67 


LFILGARILGKKSLLLTIYGTVLMSFFMWFWQQI P - VTVPLKNDMMLVAVAAGILAGTGS 


125 






+ ++G +ILG+ +L+ TI GTV +S F+ +Q+ + +PL +DM L A+ AG+ GTG 




Sbjct: 


64 


ILLIGWKILGRVTLIYTIIGTVSVSVFLEMFQRWKFMDIPLHDDMTLAALFAGVFVGTGL 


123 


Query: 


126 


GLVFRYGATTGGADIIGRIVEEKSGIKLGQTLLFIDAIVLTSSLVYINLQQMLYTLVASF 


185 






G+VFR+G TTGG DII ++ G +G+T+ DA+V+ SSL+Y+N ++ +YTL+A F 




Sbjct: 


124 


GIVFRFGGTTGGVDIIAKLGFRYLGWSMGKTMFMFDAWIASSLIYU^TOEAMYTLLAVF 


183 


Query: 


186 


VFSQVLTNVENGGYTVRGMIIITKESESAAATILHEINRGVTFLRGQGAYSGREHDVLYV 245 






+ ++ v+ ++ Y+ + II++ +E+ A TIL E+ RG T L+G+G+++G E ++LY 




Sb j ct : 


184 


IAAKVIDFIQQTAYSAKAAFIISEHTEAIADTILKEMERGATTLKGKGSFTGTEKEILYC 243 


Query . - 


246 


ALNPSEVRDVKEIMADLDPDAFISVINVDEVISSDFKI 283 








+ +E+ +K ++ +DP AF++V +V +VI F + 




Sbjct: 


244 


WGRNELIRLKSLVERIDPHAFVTVNDVQDVIGEGFTL 281 





A related DNA sequence was identified in S.pyogenes <SEQ ID 3305> which encodes the amino acid 
sequence <SEQ ID 3306>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 3060 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

>GP:BAB05397 GB : AP001512 unknown conserved protein [Bacillus halodurans] 
Identities = 116/276 (42%) , Positives = 182/276 (65%) , Gaps = 1/276 (0%) 



Query : 






0 0 






K +1 LG AI++FG V FNIVI N L1AEGC3 GXTJj1J-i+ r 1JNPA .N+F+ 




Sb j ct : 


6 


KWKNIVFILLGSAIFSFGLVYFIWENN^ 


65 


Query: 


C Q 

by 




1 0*7 






1 t (~* VT Pi 1 T rnT /-irrTT 7 , O t» 1 ft/I 1 ^ i 1 1 1 T 1 T^M T 7\ 1 1\P 1 U Z™ 1 /""T 

++b KI G+ +ii TJL GiV +b F+ 1VI+U+ +++ Jj -flJIYl Li A+ Jv&+r G b Gl 




OK-i ,-,4- . 


DO 




125 


Query : 


128 


VFRYGATTGGTDIIGRIAEEKFGAKLGQTLLLVDALVLTASLTYVDLKHMLYTLVASFVF 


187 






VFR+G TTGG DII ++ G +G+T+ + DA+V+ +SL Y++ + +YTL+A F+ 




Sbjct: 


126 


VFRFGGTTGGVDIIAKLGFRYLGWSMGKTMFMFDAWIASSLIYOTfRFAMYTLIAVFIA 


185 


Query: 


188 


SQMISWQNGGYTIRGMI I ITKHSFJU^QAILTEimGVTYLKGQGAYSGNDYNIMYVTL 


247 






+++I +Q Y+ + II++H+EA A IL E+ RG T LKG+G+++G + I+Y + 




Sbjct: 


186 


AKV1DFIQQTAYSAKAAFI1SEHTEAIADTILKEMERGATTLKGKGSFTGTEKEILYCW 


245 


Query: 


248 


NPTEVREVKRILAGLDPDAFISIIDVDEVISSDFKI 283 








E+ +K ++ +DP AF+++ DV +VI F + 




Sbjct: 


246 


GRNELIRLKSLVERIDPHAFVTVNDVQDVIGEGFTL 281 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 206/286 (72%) , Positives = 250/286 (87%) 



Query: 


5 


DLKTKIKETILIAFGVALYTFGFVKFNMANHIiREGGISGVTLIIHALFGVNPALSSIjLLN 


64 






D TK+ + LIA GVA+YTFGPV KNMAN LAEGG++G+TLI+HA FG+NPA SSLL N 




Sbjct: 


5 


DKLTKLLKLFLIALGVAIYTFGFvNFNMANALAEGGVAGITLILHAHFGINPAYSSLLFN 


64 


Query: 


65 


IPLFILGARILGKKSLLLTIYGTVI.MSFFmFWQQIPVTVPLKWDMMLVAVAAGILAGTG 


124 






+PLFILGA+I GK+SL LTIYGTVLMS F+W WQ++P+ + L+NDMMLVAV AG+ +G G 




Sbjct: 


65 


LPLFILGAKIFGKRSIALTIYGTVLMSAFIVMWQKVPIELGLENDMMLVAVVAGLFSGIG 


124 


Query: 


125 


SGLVFRYGATTGGADIIGRIVEEKSGIKLGQTLLPIDAIVLTSSLVYINLQQMLYTLVAS 


184 






SG+VFRYGATTGG DIIGRI EEK G KLGQTLD +DA+VLT+SL Y++L+ MLYTLVAS 




Sb j ct : 


125 


SGIVFRYGATTGGTDIIGRIAEEKFGAKLGQTBLLVDALVLTASLTYVDIiKHMLYTLVAS 


184 


Query: 


185 


FVFSQVLTNVENGGYTVRGMI I ITKESESAAATILHEIKRGVTFLRGQGAYSGREHDVLY 


244 






FVFSQ+++ V+NGGYT+RGMIIITK SE+AA IL EINRGVT+L+GQGAYSG +++++Y 




Sbjct: 


185 


FVFSQMISWQNGGYTIRGMIIITKHSEAAAQAILTEINRGVTYLKGQGAYSGNDYNIMY 


244 


Query: 


245 


VAI^PSEVRDVKEIMADLDPDAFISVINVDEVISSDFKIRRRNYDK 290 








V LNP+EVR+VK I+A LDPDAFIS+I+VDEVISSDFKIRRRNYDK 




Sb j ct : 


245 


VTLNPTEVREVKRILAGLDPDAFIS I IDVDEVISSDFKIRRRNYDK 290 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1074 

A DNA sequence (GBSxl 148) was identified in S.agalactiae <SEQ ID 3307> which encodes the amino 
acid sequence <SEQ ID 3308>. This protein is predicted to be BacB protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4355 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA11330 GB.-D78257 BacB [Enterococcus faecalis] 
Identities = 27/88 (30%) , Positives = 48/88 (53%) , Gaps = 1/88 (1%) 

Query: 1 MPSEKEILDALSKVYSEEVIQADDYFRQAIFELASQLEKEGMN-SLLATKIDSLINQYVL 59 

M ++E+LD LSK Y++ I + + +FE A +L N + K+ ++ ++Y+ 

Sbjct: 1 MDKQQELLDLLSKAYJSTOPKIlffiYEGLKDKLFEayCRLTTlffiTNIGEVCYKLSTINSEY^ 60 

Query: 60 THQFDAPKS I FDLSRLVKTKASHYKGTA 87 

H F+ PKSI +L + V + Y+G A 
Sbjct: 61 RHHFEMPKSIIELQKFVTKEGQKYRGWA 88 

15 A related DNA sequence was identified in S.pyogenes <SEQ ID 3309> which encodes the amino acid 
sequence <SEQ ID 3310>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 

20 Final Results 

bacterial cytoplasm Certainty=0 . 2712 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 An alignment of the GAS and GBS proteins is shown below. 

Identities = 99/102 (97%) , Positives = 102/102 (99%) 

Query: 1 MPSEKEILDALSKVYSEEVICjMJDYFRQAIFELASQLEKEGMNSLIATKIDSLINQYVLT 60 
MPSEKEILDALSKVYSE+VIQADDYPRQAIFELASQLEKEGM+SLLATKIDSLINQY+LT 
30 Sbjct: 7 MPSEKEILDALSKVYSEQVIQADDYFRQAIFELASQLEKEGMSSLLATKIDSLINQYILT 66 



35 



Query: 61 HQFDAPKSIFDLSRLVKTKASHYKGTAISAIMLGSFLSGGPK 102 

HQFDAPKS I FDLSRLVKTKASHYKGTAI SAIMLGS FLSGGPK 
Sbjct: 67 HQFDAPKS I FDLSRLVKTKASHYKGTAI SAIMLGSFLSGGPK 108 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1075 

A DNA sequence (GBSxll49) was identified in S.agalactiae <SEQ ID 331 1> which encodes the amino 
40 acid sequence <SEQ ID 3312>. This protein is predicted to be ArgS (argS). Analysis of this protein 
sequence reveals the following: 

Possible site: 57 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2522 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 A related GBS nucleic acid sequence <SEQ ID 10271> which encodes amino acid sequence <SEQ ID 
10272> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 



>GP:AAF86984 GB.-AF282249 ArgS [Lactococcus lactis subsp. lactis] 
Identities = 377/566 (66%) , Positives = 464/566 (81%) , Gaps = 5/566 (0%) 
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Query: 12 MDTKHLIASEIQKWPD-MEQSTILSLLETPKNSSMGDLAFPAFSLAKTLRKRPQIIASD 70 

MD K L++ + + + I +++E PK+S +GDLAFPAF IAKTLRK+PQIIA + 
Sbjct: 1 MDEKQLVSQALSAAIDGVLGVEQIAAIIEKPKSSDLGDIAFPAFQIjAKTIjRKSPQIIAGE 60 

5 Query: 71 IAEQIKSDQFEKVEAVGPYVNFFLDKAAISSQVLKQVLSDGSAYATQNIGEGRNVAIDMS 130 

IAE+I + FEKV AVGPYVNFFLDK A +S+V+++VL++G Y NIGEG NV IDMS 
Sbjct: 61 IAEKIDTKGFEKVIAVGPYWFFLDKNATASEVIREVLAEGEHYGDANIGEGGNVPIDMS 120 

Query: 131 SPNIAKPFSIGHLRSWIGDSIANIFDKIGYHPVKINHLGDWGKQFGMLIVAYKKWGNEE 190 
10 +PNIAKPFSIGHLRSTVIGDS+A I++K+GY P+KINHLGDWGKQFG+LI AYKK+G+E 

Sbjct: 121 APNIAKPFSIGHLRSTVIGDSIAKIYEKLGYQPIKINHLGDWGKQFGLLITAYKKYGDEA 180 

Query: 191 AVRAHPIDELLKLYVRINAEAETDPSVDEEAREWFRKLEANDPEATELWQWFRDESLLEF 250 
+ A+PIDELLKLYV+INAEA+ D VDEE R+WF K+E D EA +W+WF D SL+EF 
15 Sbjct: 181 TITANPIDELLKLYVKINAEAKEDSEVDEEGRQWFLKMEQGDEEALRIWKWFSDVSLIEF 240 

Query: 251 mLYDQMNVTFDSYNGEAFYNDKMDEVLELLESKNLLVESKGAQVVNLEKYGIEHPALIK 310 

NR+Y ++ VTFD + GE+FY+DKMD ++E LE+KNLL ESKGA +V+LEKY + +PALIK 
Sbjct: 241 miYGKLGVTFDHFMGESFYSDKMDAIVEDLENKNLLHESKGALIVDLEKYNL-NPALIK 299 

20 

Query: 311 KSDGATLYITRDIAAALYRKRTYDFAKSIYWGNEQSAHFKQLKAVLKEMDYDWSDDMTH 370 

K+DGATLYITRDLA A YRK+T++F KS+YWG EQ+ HFKQLKAVLKE YDWSDDM H 
Sbjct: 300 KTDGATLYITRDIATAAYRKKTFNFVKSLYWGGEQTT3HFKQLKAVLKEAGYDWSDDMVH 359 

25 Query: 371 VPFGLOTKGGAKLSTRKG1WILLEPTVAEAINRAASQIEAKNPNIADKDKVAQAVGVGAI 430 

VPFG+VT+GG K STRKG+V+ LE + EA++RA QIEAKNPNL +K++VA+ VGVGA+ 
Sbjct: 360 VPFG^OTQGGKKFSTRKGHVVKLEMALDEAVDRAEKQIEAKNPNLENKEEVAKQVGVGAV 419 

Query: 431 KFYDLKTDRTNGYDFDLEAMVSFEGETGPYVQYAHARIQSILRKANFSPSNSDNYSL--N 488 
30 KFYDLKTDR NGYDFDL+ MVSFEGETGPYVQYAHARIQSILRKAN N DN SL + 

Sbjct: 420 KFYDLKTDRNNGYDFDLDE^SFEGETGPWQYAHARIQSILRKAN-RKVNIDNISLWS 478 

Query: 489 DVESWEIIKLIQDFPRIIVRAADNFEPSIIAKFAINLAQCFNKYYAHTRILDEDAEISSR 548 
D E+WEI+K +++FP 1+ RAADN+EPSIIAK+AI+LAQ ENKYYAH RIL++DA++ R 
35 Sbjct: 479 DAEAWE I VKALKEFPNI VKRAADNYEPS I IAKYAISLAQAFNKYYAHVRILEDDAQLDGR 538 

Query: 549 LALCYATATVLKESLRLLGVDAPNEM 574 

LAL AT+ VLKE+LRLLGV AP M 
Sbjct: 539 LALI SATS I VLKEALRLLGVAAPENM 564 



40 



45 



50 



A related DNA sequence was identified in S.pyogenes <SEQ ID 3313> which encodes the amino acid 
sequence <SEQ ID 3314>. Analysis of this protein sequence reveals the following: 



Possible site: 46 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1734 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 492/563 (87%) , Positives = 526/563 (93%) 

Query: 12 MDTKHLIASEIQKWPDMEQSTILSDLETPKNSSMGDLAFPAFSLAKTLRKAPQIIASDI 71 
55 MDTK LIASEI KWP++EQ I +LLETPKNS MGDLAFPAFSLAK LRKAPQ+ IAS + + 

Sbjct: 1 MDTKTLIASEIAKWPELEQDAIFNLLETPKNSDMGDLAFPAFSLAKVLRKAPQMIASEL 60 

Query: 72 AEQIKSDQFEKVEAVGPYVNFFLDKAAISSQVLKQVLSDGSAYATQNIGEGRNVAIDMSS 131 
AEQI QFEKV AVGPY+NFFLDKA ISSQVL+QV++ GS YA Q+ G+GRNVAIDMSS 
60 Sbjct: 61 AEQIDESQFEKVVAVGPYINFFLDKAKISSQVLEQVITAGSDYAQQDEGQGRNVAIDMSS 120 

Query: 132 PNIAKPFSIGHLRSTVIGDSLANIFDKIGYHPVKINHLGDWGKQFGMLIVAYKKWGNEEA 191 

PNIAKPFS IGHLRSTVIGDSLA+ IF K+GY PVKINHLGDWGKQFGMLIVAYKKWG+E A 
Sbjct: 121 PNIAKPFSIGHLRSWIGDSLAHIFAKMGYKPVKINHXiGDWGKQFGMLIVAYKKWGDEAA 180 

65 
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Query: 


192 


VRAHPIDELLKLYVRINAEAETDPSVDEEAREWFRKLEANDPEATELWQWFRDESLLEFN 


251 






V+AHPIDELLKLYWINAEAETDP+VDEEAREWFRKLE D FATELWQWFRDESLLEFN 




Sb j ct : 


181 


VQAHPIDELLKLYVRINAEAETDPTVDEEAREWFRKIiEDGDKEATELWQWFRDESLLEFN 


240 


Query: 


252 


RLYDQMNOTFDSYNGFAFYISIDKMDEVLELI^ 


311 






RLYDQ++VTFDSYWGFAFYNDKMDEVL+LLE+KNLLVESKGAQVVNLEKYGIEHPALIKK 




Sb j ct : 


241 


RLYDQLHVTFDSYNGEAFYjtsTOKMDEVLDLLEAKNLLVESKGAQVVNIjEKYGIEHPALIKK 


300 


Query: 


312 


SDGATLYITRDLflAALyRKRTYDFAKSIYWGNEQSAHFKQLKAVLKEMDYDWSDDMTHV 


371 






SDGATLYITRDIAAALYRKRTYDFAKS+YWGNEQ+AHFKQLKAVLKEM YDWSDDMTHV 




Sb j ct : 


301 


SDGATLYITRDLAAALYRKRTYDFAKSVYWGNEQAAHFKQLKAVLKEMGYDWSDDMTHV 


360 


Query: 


372 


PFGLVTKGGAKLSTRKGOTILLEPTVAEAINRAASQIEAKNPNLADKDKVAQAVGVGAIK 


431 






FGLVTKGGAKLSTRKGNVILLEPTVAEAINRAASQIEAKNPNLADK+ VA AVGVGAIK 




Sbjct: 


361 


AFGLOTKGGAKIlSTRKGMVILLEPTVAEAI^KftASQIEAKNPl^iADKEAVAHAVGVGAIK 


420 


Query: 


432 


FYDLKTDRTNGYDFDLFAMVSFEGETGPWQYAHARIQSILRKANFSPSNSDNYSLNDVE 


491 






FYDLKTDR NGYDFDLEAMVSFEGETGPYVQYAHARIQSILRKA+F+PS + YSL D E 




Sb j ct : 


421 


FYDLKTDRMNGYDFDLEAMVSFEGETGPYVQYAHARIQSILRKADFTPSATTTYSLADAE 


480 


Query: 


492 


SWEIIKLIQDFPRIIVRAADNFEPSIIAI<FAINLAQCFMKYYAHTRILDEDAEISSRLAL 


551 






SWEIIKLIQDFPRII R +DNFEPS I +AKFAINLAQ FNKYYAHTRILD+++E +RLAL 




Sbjct: 


481 


SWEIIKLIQDFPRIIKRTSDNFEPSIMAKFAINLAQSFNKYYAHTRILDDNSERDNRLAL 


540 


Query: 


552 


CYATATVLKESLRLLGVDAPNEM 574 








CYATATVLKE+LRLLGVDAPNEM 




Sbjct: 


541 


CYATATVLKEALRLLGVDAPNEM 563 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1076 

A DNA sequence (GBSxll50) was identified in S.agalactiae <SEQ ID 3315> which encodes the amino 
acid sequence <SEQ ID 3316>. This protein is predicted to be arginine hydroximate resistance protein 
(argR). Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3252 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10269> which encodes amino acid sequence <SEQ ID 
1027O was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88596 GB:M18729 unknown protein [Streptococcus pneumoniae] 
Identities = 63/141 (44%) , Positives = 90/141 (63%) 

Query: 4 MNKIERQKRIKRLIQSGQIGTQEEIKLHLKNEGIDVTQATLSRDLREIGLLKLRSPEGKL 63 

M K +R + IK++I ++ TQ+EI+ L+ + VTQ TLSRDLREIGL K++ + 
Sbjct: 1 MRKRDRHQLIKKMITEEKLSTQKEIQDRLEAHNV(OTQTTLSRDLREIGLTKVKKND^^W 60 

Query: 64 YYSLSTATSNRFSPALRSYILKVSRASFMLVIinMLGEASVIjANFIDEKGLPEILGTMAG 123 

Y ++ L ++ V+RA F LVL+T LGEASVLRN +D ILGT+AG 

Sbjct: 61 YVIVNETEKIDLWLSHHLEGVARAEFTLVI^ 120 

Query: 124 ADTLLVICQNEDIAKVFEKEL 144 

A+TLLVIC+++ +AK+ E L 
Sbjct: 121 ANTLLVI CRDQHVAKLMEDRL 141 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3317> which encodes the amino acid 
sequence <SEQ ID 3318>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3 17 6 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 101/145 (69%) , Positives = 121/145 (82%) 

Query: 4 MNKIERQKRIKRLIQSGQIGTQEEIKLHLKNEGIDVTQATLSRDLREIGLLKLRSPEGKL 63 

MNK+ERQ++IKR+IQ+ IGTQE+IK HL+ EGI VTQATLSRDLREIGLLKLR +GKL 
Sbjct: 1 MNKMERQQQIKRIIQAEHIGTQEDIKNHLQKEGIWTQATLSRDLREIGLLKLRDEQGKL 60 

Query: 64 YYSLSTATSNRFSPALRSYILKVSRASFMLVLNTNLGEASVIANFIDEKGLPEILGTMAG 123 

YYSLS + FSP +R Y+LKV RA FMLVL+TNLGEA VLAN ID + +ILGT+AG 
Sbjct: 61 YYSLSEPVATPFSPETOFYVLKOTRAGFMLVLHTNLGEADVLANLIDNDAIEDILGTIAG 120 

Query: 124 ADTLLVICQNEDIAKVFEKELSVGL 148 

ADTLLVIC++E+IAK FEK+L+ GL 
Sbjct: 121 ADTLLVI CRDEE IAKRFEKDIAAGL 145 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1077 

A DNA sequence (GBSxll51) was identified in S.agalactiae <SEQ ID 3319> which encodes the amino 
acid sequence <SEQ ID 3320>. This protein is predicted to be DNA mismatch repair protein hexa (mutS). 
Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 



Certainty=0 .3570 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88597 GB:M18729 mismatch repair protein [Streptococcus pneumoniae] 
Identities = 593/858 (69%) , Positives = 698/858 (81%) , Gaps = 14/858 (1%) 

Query: 1 MAKPT I SPGMQQYLD I KENYPDAFLLFRMGDFYELFYDDA VKAAQILE I SLTSRNKNAEK 60 

MA +SPGMQQY+DIK+ YPDAFLLFRMGDFYELFY+DAV AAQILEISLTSRNKNA+ 
Sbjct: 1 MAIEKLSPGMQQYVDIKKQYPDAFLLFRMGDFYELFYEDAVNAAQILEISLTSRNKNADN 60 

Query: 61 PIPMAGVPYHSAQQYIDVLVELGYKVAIAEQMEDPKKAVGVVKREWQvVTPGTVVESTK 120 

PIPMAGVPYHSAQQYIDVL+E GYKVAIAEQMEDPK+AVGWKREWQV+TPGTW+S+K 
Sbjct: 61 PIPMAGVPYHSAQQYIDvlIEQ^YKVAIAEQIffiDPKQAVGvVKREWQVITPGTVVDSSK 120 

Query: 121 PDSANNFLVAIDSQDQQTFGLAYMDVSTGEFQATLLTDFESVRSEIIJJLKAREIWGYQL 180 

PDS NNFLV+ID + Q FGLAYMD+ TG+F T L DF V EI NLKARE+V+GY L 
Sbjct: 121 PDSQNNFLVSIDREGNQ-FGLAYMDLVTGDBWrGLLDFTLVCGEIRNLKAREVVLGYDL 179 

Query: 181 TDEKNHLLTKQMNLLLSYEDERLNDIHLIDEQLTDLEISAAEKLLQYVHRTQKRELSHLQ 240 
++E+ +L++QMNL+LSYE E D+HL+D +L +E +A+ KLLQYVHRTQ REL+HL+ 



Final Results 

bacterial cytoplasm 

bacterial membrane 

bacterial outside 
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V+ YEIKD+LQM YATK SLDL+ENAR+ KK GSL+WLLDETKTAMG R+LR+WI RPL 



+ RI +RQ+++QVFLD+FFER+DLT+SLKGVYDIERLASRVSFGK NPKDLLQL TL 



S +PRI+ IL+ QP L ++ ++D +PELESLI+ AIAPEA IT+G II++GFD+ 



LD YR V+REGT WIA+IEAKER SGI TLKIDYNKKDGYYFHVTNS L VP HFFRK 



ATLKNSER+GT ELA+ IEG+MLEARE+S +NLEY+ 1 FMR+R +V YI+RLQ LA+ IATV 



DVLQSLAWAE H +RP+F D QI 1+ GRHA VEKVMG Q YIPN+I T IQL 



+TGPNMSGKSTYMRQLA+T +MAQ+G +V A+ LP+FDAIFTRIGAADDL+SGQSTFM 



VEMMEAN A+ A+ SLILFDELGRGTATYDGMALRQSIIEYIH+ + AKT+FATHYHE 



LT L L LVNVHVATLE+DG+VTFLHKIE GPADKSYGIHVAKIAGLP DLL RA 



IL+QLE + SP T+ + E Q+SLF+ + ++ EL +D+ 

ILTQLENQGTE SPPPMRQTSAVTE QISLFDR-AEEHPILAELAKLDVY 826 

NLTPMQAMNAI FDLKKLL 858 
N+TPMQ MN + +LK+ L 
NMTPMQVMNVLVELKQKL 844 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3321> which encodes the amino acid 
sequence <SEQ ID 3322>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 532 - 548 ( 532 - 549) 



Sb j ct : 


180 


Query: 


241 


Sb j ct : 


240 


Query: 


301 


Sb j ct : 


300 


Query: 


361 


Sb j ct : 


360 


Query: 


421 


Sb j ct : 


420 


Query: 


481 


Sb j ct : 


480 


Query: 


541 


Sb j ct : 


540 


Query: 


601 


Sb j ct : 


600 


Query: 


661 


Sb j ct : 


660 


Query: 


721 


Sb j Ct : 


720 


Query: 


781 


Sbj Ct : 


780 


Query: 


841 


Sbjct: 


827 



Final Results 

bacterial membrane Certainty=0 . 1553 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 661/858 (77%) , Positives = 746/858 (86%) , Gaps = 7/858 (0%) 

Query: 1 MAKPTISPGMQQYLDIKENYPDAFLLFRMGDFYELFYDDAVKAAQILEISLTSRNKNAEK 60 

MAK ISPGMQQYLDIK++YPDAFLLFRMGDFYELFY+DAVKAAQ+LEI LTSRNKNAE 
Sbjct: 1 MAKTNI S PGMQQYLD I KKDYPDAFLLFRMGDFYELFYEDAVKAAQLLEIGLTSRNKNAEN 60 



Query: 61 PIP^GVPYHSAQQYIDVLWLGYKVAIAEQMEDPKKAVGVVKREVVQVVTPGTVVESTK 120 
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PIPMAGVP+HSAQQYIDVL+ELGYKVA+AEQMEDPK+AVGVVKREVVQV+TPGTVV+S K 
Sbjct: 61 PIPMAGVPHHSAQQYIDVLIELGYKVAVAEQMEDPKQAVGVVKREVVQVITPGTVVDSAK 120 

Query: 121 PDSANNFLVAIDSQDQQTFGLAYMDVSTGEFQATLLTDFESVRSEILNLKAREIWGYQL 180 
5 PDSANNFLVA+D D +GLAYMDVSTGEF T L DF SVRSEI NLKA+E+++G+ L 

Sbjct: 121 PDSANNFLVAVDF - DGCRYGIAYMDVSTGEFCVTDLADFTSVRSE I QNLKAKEVLLGFDL 179 

Query: 181 TDEKNHLLTKQMNLLLSYEDERLNDIHLIDEQLTDLEISAAEKLLQYVHRTQKRELSHLQ 240 
++E+ +L KQMNLLLSYE+ D LID QLT +E++AA KLLQYVH+TQ RELSHLQ 
10 Sbjct: 180 SEEEQTILVKQMNLLLSYEETVYEDKSLIDGQLTTVELTAAGKLLQYVHKTQMRELSHLQ 239 

Query: 241 KVVHYEIKDYLQMSYATKNSLDLLENARTSKKHGSLYWLLDETKTAMGTRMLRTWIDRPL 300 

+VHYEIKDYLQMSYATK+SLDL+ENART+KKHGSLYWLLDETKTAMG R+LR+WIDRPL 
Sbjct: 240 ALVHYEIKDYLQMSYATKSSLDLVENARTNKKHGSLYWLLDETKTAMGMRLLRSWIDRPL 299 

15 

Query: 301 VSMNRI KERQD 1 1 QVFLDYFFERNDLTE SLKGVYD I ERLASRVS FGKANPKDLLQLGQTL 360 

VS I ERQ+IIQVFL+ F ER DL+ SLKGVYDIERL+SRVSFGKANPKDLLQLG TL 
Sbjct: 300 VSKEA1LERQEIIQVFLNAFIERTDLSNSLKGVYDIERLSSRVSFGKANPKDLLQLGHTL 359 

20 Query: 361 SQIPR1KMILQSFNQPELDIIVNKIDTMPELESLINTAIAPEAQATITEGNIIKSGFDKQ 420 

+Q+P IK IL+SF+ P +D +VN ID++PELE LI TAI P+A ATI+EG+II++GFD++ 
Sbjct: 360 AQVPYIKAILESFDSPCVDKLVNDIDSLPELEYLIRTAIDPDAPATISEGSIIRNGFDER 419 

Query: 421 LDNXHTVI^EGTGWIADIEAKERAASGIGTLKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 480 
25 LD+YR VMREGTGWIADIEAKER ASGI LKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 

Sbjct: 420 LDHYRKVMREGTGWIADIEAKERQASGINNLKIDYNKKDGYYFHVTNSNLSLVPEHFFRK 479 

Query: 481 ATLKNSERYGTAELAKIEGEMLEAREQSSNLEYDIFMRWAQVESYIKRLQEIAKTIATV 540 
ATLKNSERYGTAELAKIEG+MLEARE+SS+LEYDIFM +RAQVE+YI RLQ+LAK +ATV 
30 Sbjct: 480 ATLKNSERYGTAELAKIEGQMLFJVREESSSLEYDIFMCIRAQVETYINRLQKLAKILATV 539 

Query: 541 DVLQSIAWAENYHYVRPKFNDQHQIKIKNGRHATVEKVMGVQEYIPNSIYFDSQTDIQL 600 

DVLQSLAWAE HY+RP+FND H I 1+ GRHA VEKVMGVQEYIPNSI FD QT IQL 
Sbjct: 540 DVLQSIiAVVAETNHYIRPQFOTKHVITIQEGRHAVVEKVMGVQEYIPNSISFDQQTSIQL 599 

35 

Query: 601 ITGPNMSGKSTYMRQLALTVIMAQMGGFVSADEVDLPVFDAIFTRIGAADDLISGQSTFM 660 

ITGPNMSGKSTYMRQLALTVIMAQMG FV+AD VDLP+FDAIFTRIGRADDLISGQSTFM 
Sbjct: 600 ITGPNMSGKSTYMRQLALTVIMAQMGSFVAADHVDLPLFDAIFTRIGAADDLISGQSTFM 659 

40 Query: 661 VEMMEANQAVKRASDKSLILFDELGRGTATYDGMALAQSIIEYIHDRVRAKTMFATHYHE 720 

VEMMEANQA+KRASD SLILFDELGRGTATYDGMALAQ+IIEYIHDRV AKT+ FATHYHE 
Sbjct: 660 VEMMEANQAIKRASDNSLILFDELGRGTATYDGMALAQAIIEYIHDRVGAKTIFATHYHE 719 

Query: 721 LTDLSEQLTRLVNVHVATLERDGEVTFLHKIESGPADKSYGIHVAKIAGLPIDLLDRATD 780 
45 LTDLS LT LVNVHVATLE+DG+VTFLHKI GPADKSYGIHVAKIAGLP LL RA + 

Sbjct: 720 LTDLSTNLTSLVNVHVATLEKDGDVTFLHKIAEGPADKSYGIHVAKIAGLPKSLLKRADE 779 

Query: 781 ILSQLEADAVQLIVSPSQEAVTADLNEELDSEKQQGQLSLFEEPSNAGRVIEELEAIDIM 840 
+L++LE S S E ++ E S +QGQLSLF + A + + LE ID+M 

50 Sbjct: 780 VLTRLETQ SRSTEIISVPSQVESSSAVRQGQLSLFGDEEKAHEIRQALEVIDVM 833 

Query: 841 NLTPMQAMNAIFDLKKLL 858 

N+TP+QAM +++LKKLL 
Sbjct: 834 NMTPLQAMTTLYELKKLL 851 

55 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1078 

A DNA sequence (GBSxll52) was identified in S.agalactiae <SEQ ID 3323> which encodes the amino 
60 acid sequence <SEQ ID 3324>. This protein is predicted to be cold shock protein-related protein. Analysis 
of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N-terrainal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 2095 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB69404 GB:A91080 unnamed protein product [unidentified] 
Identities = 48/63 (76%) , Positives = 56/63 (88%) 

Query: 1 MTQGTVTSWFNSEKGFGFISSETGTDVFAHFSEIKvTDGFKTLEEGQKVTFDIQDGQRGPQA 60 

MT+GTVKWFN +KGFGFI+SE G DVFAHFS+I+ GFKTL+EGQKVTFD++ GQRGPQA 
Sbjct: 1 MTKGTVTOFNPDKGFGFITSEDGQDVFAHFSQIQTSGFKTLDEGQKVTFDVEAGQRGPQA 60 

15 Query: 61 TNI 63 

NI 

Sbjct: 61 VNI 63 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3325> which encodes the amino acid 
20 sequence <SEQ ID 3326>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0 . 2350 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

30 Identities = 49/63 (77%) , Positives = 56/63 (88%) 

Query: 1 MTQGTVKWFNSEKGFGFISSETGTDVFAHFSEIKVDGFKTLEEGQKVTFDIQDGQRGPQA 60 

M QGTVKWFN+EKGFGFIS+E G DVFAHFS 1+ +GFKTLEEGQKV FD+++GQRGPQA 
Sbjct: 3 ^QGTVKWFNAEKGFGFISTENGQDVFAHFSAIQTNGFKTLEEGQKVAFDVEEGQRGPQA 62 

Query: 61 TNI 63 
NI 

Sbjct: 63 VNI 65 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



35 



Example 1079 

A DNA sequence (GBSxll53) was identified in S.agalactiae <SEQ ID 3327> which encodes the amino 
acid sequence <SEQ ID 3328>. Analysis of this protein sequence reveals the following: 

45 Possible site: 13 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 6378 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1080 

A DNA sequence (GBSxll54) was identified in S.agalactiae <SEQ ID 3329> which encodes the amino 
acid sequence <SEQ ID 3330>. This protein is predicted to be DNA mismatch repair protein hexb (mutL). 
Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2242 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10267> which encodes amino acid sequence <SEQ ID 
10268> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA88600 GB:M29686 mismatch repair protein [Streptococcus pneumoniae] 
Identities = 452/S57 (68%) , Positives = 543/657 (81%) , Gaps = 8/657 (1%) 



Query: 


20 


LSKIIELPDI1ANQIAAGEVVERPSSVVKELVENAIDAGSSQITIEVEESGLKKIQITDN 


79 






+S IIELP++LANQIAAGEV+ERP+SV KELVENAIDAGSSQI IE+EE+GLKK+QITDN 




Sbjct : 


1 


MSHIIELPEMLAWQIAAGEVIERPASVCKELVENAIDAGSSQIIIBIEEAGLKKVQITDN 


60 


Query: 




r 1 irr 1 M r rG'G 1 nfl'X7T.GT DDUaTOTTTITOACnT.DDTDTT HTnpPTmT.DC T SCTCT M'FT Tf r T7i p n?rY- , TTn 
tjjitjil v ll OiliUi\VlJDijKKni^lOJ\..LJ\jbUoLMJJ? KXKli-ior KloILrtlJfo Xi-loXOijl v ll JLJ\.l/il 


1 "5 Q 
J.,5 7 






G G+ ++ L+LRRHATSKIK+Q+DLFRIRTLGFRGEALPSIAS+S++T+ TA + 




Sbjct: 


61 


GHGIAHDEVELALRRHATSKIKNQADLFRIRTLGFRGEALPSIASVSVLTLLTAVDGASH 


120 


Query: 


140 


GTLLVAKGGNIEKQEWSSPRGTKILVENLFFNTPARLKYMKSLQSELAHIIDIVNRLSL 


199 






GT LVA+GG +E+ +SP GTK+ VE+LFFNTPARLKYMKS Q+EL+HIIDIVNRL L 




Sbjct: 


121 


GTKLVARGGEVEEVIPATSPVGTKVCVEDLFFNTPARLKYMKSQQAELSHIIDIVNRLGL 


180 


Query: 


200 


AHPEVAFTLINDGKEMTKTSGTGDLRQAIAGIYGLNTAKKMIEISNADLDFEISGYVSLP 


259 






AHPE++F+LI+DGKEMT+T+GTG LRQAIAGIYGL +AKKMIEI N+DLDFEISG+VSLP 




Sb j ct : 


181 


AHPEISFSLISDGKEMTRTAGTGQLRQAIAGIYGLVSAKKMIE1ENSDLDFEISGFVSLP 


240 


Query: 


260 


ELTRANRNYITLLINGRYIKNFLIjNRSILDGYGSKLMVGRFPIAVIDIQIDPYLADVNVH 


319 






ELTRANRNYI+L INGRYIKNFLLNR+ILDG+GSKLMVGRFP+AVI I IDPYLADVNVH 




Sb j ct : 


241 


ELTRANRNYISLFINGRYIKNFLLNRAILIX3FGSKIWGRFPLAVIHIHIDPYIADVNVH 


300 


Query: 


320 


PTKQEVRISKERELMSLISTA1SESLKQYDLIPDALENLAKTSTRSVDKPIQTSFSLKQP 


379 






PTKQEVRISKE+ELM+L+S A1+ SLK+ LIPDALENLAK++ R+ +K QT LK+ 




Sbjct: 


301 


PTKQEWISKEKELMTLVSFAlANSIiKEQTLIPDALENIAKSTVRNREKVEQTlLPLKEN 


360 


Query: 


380 


GLYYDRAKNDFFIGADTVSEPIANFTNLDKSDGSVDNDVKNSVNQGATQSPNIKYASRDQ 


439 






LYY++ + + +E L + K ++++ T+ + +A R 




Sb j ct : 


361 


TLYYEKTEP SRPSQTEVADYQVELTDEGQDLTLFAKETLDR-LTKPAKLHFAERKP 


415 


Query: 


440 


ADSENFIHSQDYLSSKQSLNKLVEKLDSEESSTFPELEFFGQMHGTYLFAQGNGGLYIID 


499 






A+ + H + L+ S++K +KL+ EE+S+FPELEFFGQMHGTYLFAQG GLYIID 




Sbjct: 


416 


ANYDQLDHPELDLA- - -SIDKAYDKLEREEASSFPELEFFGQMHGTYLFAQGRDGLYIID 


472 


Query: 


500 


QHAAQERVKYEYYREKIGEVDNSLQQLLVPFLFEFSSSDFLQLQEKMSLLQDVGIFLEPY 


559 






QHAAQERVKYE YRE IG VD S QQLLVP++FEF + D L+L+E+M LL++VG+FL Y 




Sb j ct : 


473 


QHAAQERVKYEEYRESIGNVDQSQQQLLVPYIFEFPADDALRLKERMPLLEEVGVFLAEY 


532 


Query: 


560 


GNNTFILREHPIWMKEEEWSGIYEMCDMLLLTNEVSVKKYRAELAIMMSCKRSIKANHT 


619 






G N FILREHPIWM EEE+ESGIYEMCDMLLLT EVS+KKYRAELAIMMSCKRSIKANH 




Sb j ct : 


533 


GENQFILREHPIVMAEEEIESGIYEMCDM^LTKEVSIKKYRAELAIMMSCKRSIKANHR 


592 
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Query: 620 LDDYSARHLLDQLAQCKNPYNCPHGRPVLVNFTKADMEKMFKRIQENHTSLRDLGKY 676 

+DD+SAR LL QL+QC NPYNCPHGRPVLV+FTK+DMEKMF+RIQENHTSLR+LGKY 
Sbjct: 593 IDDHSARQLLYQLSQCDNPYNCPHGRPVLVHFTKSDMEKMFRRIQENHTSLRELGKY 649 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 333 1> which encodes the amino acid 
sequence <SEQ ID 3332>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

>>> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 1854 (Affirmative) < suco 

bacterial membrane Certainty=0.0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 502/663 (75%) , Positives = 574/663 (85%) , Gaps = 9/663 (1%) 

Query: 20 LSKIIELPDILANQIAAGEWERPSSWKELvENAIDAGSSQITIEVEESGLKKIQITDN 79 
++ IIELP++IANQIAAGEWERP+SWKELVENAIDA SSQIT+E+EESGLK IQ+TDN 
20 Sbjct: 14 MTNIIELPEVLANQIAAGEVVERPASWKELVFJ3AIDAKSSQITVEIEESGLKMIQVTDN 73 

Query: 80 GEGMTSEDAVLSLRRHATSKIKSQSDLFRIRTLGFRGEALPSIASISLMTIKTATEQGKQ 139 

GEGM+ ED LSLRRHATSKIKSQSDLFRIRTLGFRGEALPS+ASIS +TIKTAT++ 
Sbjct: 74 GEGMSHEDLPLSLRRHATSKIKSQSDLFRIRTLGFRGEALPSVASISKITIKTATKEVTH 133 

25 

Query: 140 GTLLVAKGGNIEKQEWSSPRGTKILVENLFFOTPARLKYMKSLQSELAHIIDIVNRLSL 199 

G+LL+A GG IE E +S+P GTKI VENLF+NTPARLKYMKSLQ+ELAHI+D+VNRLSL 
Sbjct: 134 GSLBIATGGEIETLEAISTPTGTKIKVra&Fm-PARLKYMKSLQAELAHIVDvVNRLSL 193 

30 Query: 200 AHPEVAFTLINDGKEMTKTSGTGDLRQAIAGIYGIOTAKKMIEISNADLDFEISGYVSLP 259 

AHPEVAFTLI+DG+++T+TSGTGDLRQAIAGIYGLNT KKM+ ISNADLDFE+SGYVSLP 
Sbjct: 194 AHPEVAFTL I SDGRQLTQTSGTGDLRQ&IAGIYGLOTTKKMLAI SNMJLDFEVSGYVSLP 253 

Query: 260 ELTRANRNYITLLINGRYIKNFLUTOSILDGYGSKLMVGRFPIAVIDIQIDPYLADVNVH 319 
35 ELTRANRNY+T+L+NGRYIKNFLIiNR+ILDGYGSKLMVGRFPI VIDIQIDPYLADVNVH 

Sbjct: 254 ELTRANRNYMTILWGRYIKNFLIjNRAILDGYGSKLMVGRFPIWIDIQIDPYLADvNvH 313 

Query: 320 PTKQEVRISKERELMSLISTAISESLKQYDLIPDALENLAKTSTRSVDKPIQTSFSLKQP 379 
PTKQEVRISKERELM+LISTAISESLK+ DLIPDALENLAK+STR KP QT L+ 
40 Sbjct: 314 PTKQEVRISKERELMALISTAISESLKEQDLIPDALENLAKSSTRHFSKPEQTQLPLQSR 373 

Query: 380 GLYYDRAKNDFFIGADTVSEPIANFTNLDKSDGSVDNDVKNSV NQGATQSPNIK 433 

GLYYD KNDFF+ VSE I D G+VDN VK ++ ++K 

Sbjct: 374 GLYYDPQKNDFFVKESAVSEKI PETDFYSGAvDNSVKVEKVELLPHSEEVIGPSSVK 430 

45 

Query: 434 YASRDQADSENFIHSQDYLSSKQSLNKLVEKLDSEESSTFPELEFFGQMHGTYLFAQGNG 493 

+ASR Q H L ++Q L++++ +L++E S FPEL++FGQMHGTYLFAQG 

Sbjct: 431 HASRPQNTFTETDHPNLDLKNRQKLSQMLTRLENEGQSVFPELDYFGQMHGTYLFAQGKD 490 

50 Query: 494 GLYIIDQHAAQERVKYEYYREKIGEVDNSLQQLLVPFLFEFSSSDFLQLQEKMSLLQDVG 553 

GL+IIDQHAAQERVKYEYYR+KIGEVD+SLQQLLvP+LFEFS SDF+ LQEKM+LL +VG 
Sbjct: 491 GLFIIDQHAAQERVKYEYYRDKIGEVDSSLQQLLVPYLFEFSGSDFINLQEKMALIaNEVG 550 

Query: 554 IFLEPYGNNTFILREHPIVmKEEEVESGIYEMCDMLLLTNEVSVKKYRAELAIMMSCKRS 613 
55 IFLE YG+NTFILREHPIWMKEEE+ SG+YEMCDMLLLTNEVS+K YRAELAIMMSCKRS 

Sbjct: 551 IFLEVYGHNTFILREHPIWMKEEEIASGvYEMCBMLLLTNEVSIJCT YRAELAIMMSCKRS 610 

Query: 614 IKANHTLDDYSARHLLDQIiAQCKNPYNCPHGRPVLVNFTKADMEKMFKRIQENHTSLRDLGKY 676 
IKANH+LDDYSAR+LL QIAQC+NPYNCPHGRPVL+NF+KADMEKMF+RIQENHTSLR+LGKY 
60 Sbjct: 611 IKANHSLDDYSARNLLLQLAQCQNPYNCPHGRPVIiINFSKADMEKMFRRIQENHTSLRELGKY 673 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1081 

A DNA sequence (GBSxll55) was identified in S.agalactiae <SEQ ID 3333> which encodes the amino 
acid sequence <SEQ ID 3334>. Analysis of this protein sequence reveals the following: 

Possible site: IS 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3372 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1082 

A DNA sequence (GBSxll56) was identified in S.agalactiae <SEQ ID 3335> which encodes the amino 
acid sequence <SEQ ID 3336>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10265> which encodes amino acid sequence <SEQ ID 
10266> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA61918 GB:X89779 LmrP integral membrane protein [Lactococcus 
lactis] 

Identities = 145/401 (36%) , Positives = 236/401 (58%) , Gaps = 4/401 (0%) 

Query: 9 VKEFFALPKQLQLRELLRFISIWGSAIFPFMAMYYVQYFGNLVTGILIIITQLSGFVAT 68 

+KEF+ L K LQLR + F+ +F M +YY QY G+ +TGIL+ ++ ++ FVA 

Sbjct: 1 MKEFWNLDKNLQLRLGIVFLGAFSYGTVFSSMTIYYNQYLGSAITGILLALSAVATFVAG 60 

Query: 69 LYGGHLSDAMGRKKWIIGSLLATIGWAITIAANVPNHITPHLTFVGILIIEIAHQFYFP 128 

+ G +D GRK V++ G+++ +G A+ IA+N+P H+ P TF+ L+I + F 
Sbjct: 61 ILAGFFADRNGRKPVMVFGTIIQLLGAALAIASNLPGHVNPWSTFIAFLLISFGYNFVIT 120 

Query: 129 AYEAMTIDLTNEQNRRFVYTIGYWLVNIAVMLGSGIAGIFYDHHFFELLIVLLIISAICC 188 
A AM ID +N +NR+ V+ + YW N++V+LG+ + + F LL++LL+ + 
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Sbjct: 121 AGNMIIDASNMITOKAWFMLDYWAQNLSVILGAALGAWLFRPAFEALLVILLLTVLVSF 180 

Query: 189 FVVYFKFDET-KPQEGTFKHDKGVLGTFKNYSQVLVDKAFWYTLGAIGSSVVWLQVDNY 247 

F+ F ET KP T K D+ F+ Y VL DK ++++ I ++ + +Q DN+ 

Sbjct: 181 FLTTFVMTETFKP TVKVDEKAENIFQAYKTVLQDKTYMIFMGANIATTFIIMQFDNF 237 

Query: 248 FSTOLKQNFEWSILGHTITGAKMLSIAVFTNTLLIVLLMTTINKFIENWPLKRQLILGS 307 

V+L +F+ ++ G 1 G +ML++ + +L+VLLMTT+N+ ++W ++ I GS 
Sbjct: 238 LPWLSNSFKTITFWGFEIYGQRMLTIYMLACVLVVLLMTTLNRLTKDWSHQKGFIWGS 297 

Query: 308 LICGFGMLFNISLNTFGAILIAMTFFTFGEMIYVPASQVLRAEMMVEGKIGSYSGFLAIA 367 

L GM+F+ TF I IA +T GE++Y P+ Q L A++M KIGSY+G AI 
Sbjct: 298 LFMAIGMIFSFLTTTFTPIFIAGIVYTLGEIVYTPSVQTLGADLMNPEKIGSYNGVAAIK 357 

Query: 368 QPVASVIAGAMVSLSYFTGKIGVQITLTIFMLAGLVLILYA 408 

P+AS+IAG +VS+S IGV + L + + ++L+L A 

Sbjct: 358 MPIASIIAGLLVSISPMIKAIGVSLVLALTEVLA.IILVLVA 398 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3337> which encodes the amino acid 
sequence <SEQ ID 3338>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 5564 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA61918 GB:X89779 LmrP integral membrane protein [Lactococcus 
lactis] 

Identities =138/400 (34%), Positives = 223/400 (55%), Gaps = 2/400 (0%) 

Query: 1 MQEFLNLPKQIQLRQLWFWITLGSSIFPFMAMYYTTYFGTFWTGLLMMITSLMGFVGT 60 

M+EF NL K +QLR + F+ ++F M +YY Y G+ TG+L+ ++++ FV 

Sbjct: 1 MKEFWNLDKNLQLRLGIVFLGAFSYGTVFSSMTIYYNQYLGSAITGILLALSAVATFVAG 60 

Query: 61 LYGGHLSDALGRKKVIMIGSVGTTLGWFLTILRNLPNAAIPWLTFAGILLVEIASSFYGP 120 

+ G +D GRK V++ G++ LG L I +NLP PW TF LL+ +F 
Sbjct: 61 ILAGFFADRNGRKPVMVFGTIIQLLGAALAIASNLPGHVNPWSTFIAFLLISFGYNFVIT 120 

Query. 121 AYFAMLIDLTDESNRRFVYTINYWFINIAVMFGAGLSGLFYDHHFIALLVALLLVNVLCF 180 

A AM+ID ++ NR+ V+ ++YW N++V+ GA L + F ALLV LLL ++ F 
Sbjct: 121 AGNAMIIDASNAENRKWFMLDYWAQNLSVILGAALGAWLFRPAFFJUiLVILLLTVLVSF 180 

Query: 181 GVAYYCFDETRPETHAFDHGKGLLASFQNYRQVFHDRAFVLFTLGAIFSGSIWMQMDNYV 240 

+ + ET T D + FQ Y+ V D+ +++F I + I MQ DN++ 

Sbjct: 181 FLTTFVMTETFKPTVKVDEKAENI - - FQAYKTVLQDKTYMI FMGANIATTFI IMQFDNFL 238 

Query: 241 PWLKLYFQPTAVLGFQVTSSKMLSLMVLTNTLLIVLFMTVVNKLTEKWKLLPQLVVGSL 300 

PVHL F+ GF++ +ML++ ++ +L+VL MT +N+LT+ W + GSL 

Sbjct: 239 PVHLSNSFKTITFWGFEIYGQRMLTIYLILACVLWLLMTTLNRLTKDWSHQKGFIWGSL 298 



Query: 301 LFTLGMLLSFTFTQFYAIWLSVVLLTFGEMINVSASQVLRADMMDHSQIGSYTGFVSMAQ 360 
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+GM+ SF T F I+++ ++ T GE++ + Q L AD+M+ +IGSY G ++ 
Sbjct: 299 FMAIGMIFSFLTTTFTPIFIAGIVYTLGEIVYTPSVQTLGADLMNPEKIGSYNGVAAIKM 358 



Query: 361 PLGAILASLLVSVSHFTGPLGVQCLFAVIMiLGIYFTWS 400 

P+ +ILA LLVS+S +GV + A+ +L I +V+ 

Sbjct: 359 PIASILAGLLVSISPMIKAIGVSLVIALTEVIAIILvIiVA 398 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 228/406 (56%) , Positives = 305/406 (74%) 

Query: 9 VKEFFALPKQLQLRELLRF1 S I TVGSAI FPFMftMYYVQYFGNL VTGI LI I ITQLSGFVAT 68 

++EF LPKQ+QLR+L+RF++ I T+GS+ I FPFMAMYY YFG TG+L++IT L GFV T 
Sbjct: 1 MQEFIMLPKQIQLRQLTOFVTITMSSIFPFMaMYYTTYFGTFWTGLLMMITSLMGFVGT 60 

Query: 69 LYGGHLSDAMGRKKWI IGSLLATIGWAITIAANVPNHITPHLTFVGILI IEIAHQFYFP 128 

LYGGHLSDA+GRKKV++IGS+ T+GW +TI AN+PN P LTF GIL++EIA FY P 
Sbjct: 61 LYGGHLSDALGRKKVIMIGSVGTTLGWFLTILANLPNAAIPWLTFAGILLVEIASSFYGP 120 

Query: 129 AYEAMTIDLTNEQNRRFVYTIGYWLVNIAAWILGSGIAGIFYDHHFFELLIVLLIISAICC 188 

AYEAM IDLT+E NRRFVYTI YW +NIAVM G+G++G+FYDHHF LL+ LL+++ +C 
Sbjct: 121 AYFiAMLIDLTDESNRRFWTINYWFINIAVMFGAGLSGLFYDHHFLALLVALLLVNVLCF 180 



Query: 189 FVVYFKFDETKPQEGTFKHDKGVLGTFKNYSQVLVDKAFVVYTLGAIGSSVVWLQVDMYF 248 

V Y+ FDET+P+ F H KG+L +F+NY QV D+AFV+ +TLGAI S +W+Q+DNY 
Sbjct: 181 GVAYYCFDETRPETHAFDHGKGLLASFQNYRQVFHDRAFVLFTLGAI FSGS IWMQMDNYV 240 

Query: 249 S^m 1 KQNFEWSILGHTITGAKMLSIAvPTOTLLIVLLMTTINKFIENWPLKRQLILGSL 308 

V+LK F+ ++LG +T +KMLSL V TNTLLIVL MT +NK E W L QL++GSL 
Sbjct: 241 PVHLKLYFQPTAVLGFQVTSSKMLSLMVLTNTLLIVLFMTvVNKLTEKWKLLPQLvVGSL 300 

Query: 309 ICGFGMLFNISIOTFGAILIAKTFFTFGEMIYVPASQVLRAEMMVEGKIGSYSGFLAIAQ 368 

+ GML + + F AI +++ TFGEMI V ASQVLRA+MM +IGSY+GF+++AQ 
Sbjct: 301 LFTLGMLLSFTFTQFYAIWLSVVLLTFGEMINVSASQVLRADMMDHSQIGSYTGFVSMAQ 360 

Query: 369 PVASVLAGAMVSLSYFTGKIGVQITLTIFMLAGLVLILYATKMKNI 414 

P+ ++LA +VS+S+FTG +GVQ + L G+ + + KMK + 
Sbjct: 361 PLGAILASLLVSVSHFTGPLGVQCLFAVIALLGIYFOWSAKMKKV 406 

A related GBS gene <SEQ ID 8725> and protein <SEQ ID 8726> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crenel: 8 
SRCFLG: 0 

McG: Length of UR: 4 

Peak Value of UR: 1.73 

Net Charge of CR: 1 
McG: Discrim Score: -4.26 
GvH: Signal Score (-7.5): -2.48 

Possible site: 35 
>» Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 12 value: -14.01 threshold: 0.0 
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icml HYPID: 7 CFP: 0.660 



*** Reasoning Step: 3 

5 Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 6604 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0 . 0000 (Not Clear) < suco 



1 0 The protein has homology with the following sequences in the databases: 

ORF01675(325 - 1530 of 1854) 

EGAD | 40187 ] 42348 (1 - 400 of 408) integral membrane protein (lmrP) {Lactococcus lactis} 
GP| 1052754 |emb|CAA61918.l| |X89779 LmrP integral membrane protein {Lactococcus lactis} 
PIR|S5813l|S58131 integral membrane protein LmrP - Lactococcus lactis 
15 %Match =21.7 

%Identity =36.2 %Similarity =60.8 

Matches = 145 Mismatches = 155 Conservative Sub.s = 99 

243 273 303 333 363 393 423 453 

20 LQKLIVTOKCUSESKKIIQASG1*ENIDNYLLGKKGEKVKEFFALPKQLQLRELLRFISITVGSAIFPFMAMYYVQYFGNL 

:|||: I I Mil : \: = 11 =11 11 = 1 = 

MKEFWNLDKNLQLRLGIVFLGAFSYGTVFSSMTIYYNQYLGSA 
10 20 30 40 

25 483 513 543 573 603 633 663 693 

VTGILIIITQLSGFVATLYGGHLSDAMGRKKVVIIGSLLATIGWAITIAANVPNHITPHLTFVGILIIEIAHQFYFPAYE 

=1111= == == III = I ::| III l = = l=== =1 1= 11=1=1 1= I 11= 1=1 =1 I 
ITGILIiALSAVATFVAGIIiAGFFADRNGRKPVMVFGTIIQLLGAAlAIASNLPGHVNPWSTFIAFLLISFGYNFVITAGN 

60 70 80 90 100 110 120 

30 

723 753 783 813 843 873 900 930 

AMTIDLTNEQNRRFvYTIGYWLTOIAVMLGSGIAGIFYDHHFFELLIVLLIISAICCFVVYFKFDET-KPQEGTFKHDKG 

II II =1 =11= 1= = II |::|:||: = == I ll==ll= =1=1 II II 111= 
AMI IDASNAENRBCWFMLDYWAQNLSVII^GAAIiGAWLFRPAFEALLVILLLTVLVSFFLTTFVMTETFKP TVKVDEK 

35 140 150 160 170 180 190 200 

960 990 1020 1050 1080 1110 1140 1170 

VLGTFKNYSQVLVDKAFVvYTLGAIGSSVWLQVDNYFSWLKQNF 

1= I II II ==== I == = =1 l|:= 1=1 =1= ==111 =ll== = =1=111111= 

40 AENIFQAYKTvLQDKTYMIFMGANIATTFIIMQFDNFLPVHLSNSFKTITFWGFEIYGQRMLTIYLILACVLVVLLMTTL 

210 220 230 240 250 260 270 280 

1200 1230 1260 1290 1320 1350 1380 1410 

NKFIENWPLKRQLILGSLICGFGMLFNISLWTFGAILIAMTFFTFGEMIYVPASQVLRAEMMVEGKIGSYSGFLAIAQPV 
45 |:: ::| :: :| ||| ||:|: || |:|| : | : | | : : | | : | | | : : | | | | || : | || | = 

NRLTKDWSHQKGFIWGSLFMAIGMIFSFLTTTFTPIFIAGIVYTLGEIVYTPSVQTLGADLMNPEKIGSYNGVAAIKMPI 
290 300 310 320 330 340 350 360 

1440 1470 1500 1530 1560 1590 1620 1650 

50 ASVLAGAMVSLSYFTGKIGVQITLTIFMLAGLVLILYATKMKNIEIGK*NVRLY*RKIE*NNG*IYCCGNSWIGIHDICG 

11=111 =11=1 III = I = = ==l=l I 

ASIIAGLLVSISPMIKAIGVSLVLALTEVLAIILVLVAVNRHQKTKLN 

370 380 390 400 

55 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1083 

A DNA sequence (GBSxll57) was identified in S.agalactiae <SEQ ID 3339> which encodes the amino 
acid sequence <SEQ ID 3340>. This protein is predicted to be holliday junction DNA helicase (ruvA). 
60 Analysis of this protein sequence reveals the following: 



Possible site: 37 
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>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.75 Transmembrane 75 - 91 ( 74 - 91) 



Final Results 

5 bacterial membrane Certainty=0 . 1702 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

10 >GP:BAB04943 GB:AP001511 holliday junction DNA helicase [Bacillus halodurans] 

Identities = 86/201 (42%) , Positives = 122/201 (59%) , Gaps = 6/201 (2%) 

Query: 1 MYDYIKGKLSKITAKFIWETAGLGYMIYVANPYSFSGYVNQEVTIYLHQVIRDDAHLLF 60 
M DY++G L+ I ++ WE G+GY +Y NPY F + +TIY Q +R+D L+ 
15 Sbjct: 1 MIDYLRGTLTDIDHQYAWEVHGVGYQVYCPNPYEFEKERDSVITIYTFQYVREDVIRLY 60 

Query: 61 GFHTENEKEIFLNLISVSGIGPTTALAIIAVDDNEGLVSAIDNSDIKYLTKFPKIGKKTA 120 

GF T+ ++ +F L++VSGIGP ALAI+A E ++ AI+ D +L KFP +GKKTA 
Sbjct: 61 GFRTKEKRSLFEKLLNVSGIGPKGALAILATGQPEHVIQAIEEEDEAFLVKFPGVGKKTA 120 

20 

Query: 121 QQMILDLSGKFVE ASGESATSRKVSSEQNSNLEEAMEALLALGYKATELKKVKA 174 

+Q+ILDL GK E + E ++ N L+EAMEAL ALGY ELKKVK 

Sbjct: 121 RQIILDLKGKVDELHPGLFSQKEEQPKPHEKNDGNQALDEAMEALKALGYVEKELKKVKP 180 

25 Query: 175 FFEGTNETVEQYIKSSLKMLM 195 

E T + YIK +L++++ 
Sbjct: 181 KLEQETLTTDAYIKKALQLML 201 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3341> which encodes the amino acid 

30 sequence <SEQ ID 3342>. Analysis of this protein sequence reveals the following: 

Possible site: 37 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.59 Transmembrane 75 - 91 ( 74 - 91) 

35 Final Results 

bacterial membrane Certainty=0 . 1638 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the databases: 

>GP:BAB04943 GB:AP001511 holliday junction DNA helicase [Bacillus halodurans] 
Identities = 91/201 (45%) , Positives = 128/201 (63%) , Gaps = 5/201 (2%) 

MYDYIKGQLTKITAKYIVVEANGLGYMINVANPYSFTDSVNQLvTIYLHQVIREDAHLLF 60 
45 M DY++G LT I +Y WE +G+GY + NPY F + ++TIY Q +RED L+ 

MIDYLRGTLTDIDHQYAWEVHGVGYQVYCPNPYEFEKERDSVITIYTFQYVREDVIRLY 60 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


176 


Sbjct: 


181 



GF T++++ +F KL++VSGIGP ALAI+A E ++ AI+ D +L+KFP +GKKTA 



+Q++LDL GK E Q+ K GN LDEA+EAL ALGY KELKK++ 



E + T + YIK AL+L++ 
DEQETLTTDAYIKKALQLML 201 

60 An alignment of the GAS and GBS proteins is shown below. 

Identities = 153/197 (77%) , Positives = 176/197 (88%) , Gaps = 1/197 (0%) 



Query: 1 MYDYIKGKLSKITAKFIWETAGLGYMIYVANPYSFSGYVNQEVTIYLHQVIRDDAHLLF 60 
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10 



15 



40 



Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


180 


Sb j ct : 


181 



-1211- 

MYDYIKG+L+KITAK+IWE GLGYMI VANPYSF+ WQ VTIYLHQVIR+DAHLLF 
^DYIKGQLTKITAKYIVVFJVNGLGYMINVAOTYSFTDSVNQLVTIYLHQVIREDAHLLF 6 0 



GFHTE+EK++FL LISVSGIGPTTALAI+AVDDNEGLV+AIDNSDIKYL KFPKIGKKTA 



QQM+LDL+GKFVEA E+ T + + N+ L+EA+EALLALGYKA ELKK++AFFEGT 



+ET EQYIKS+LK+LMK 
SETAEQYI KSALKLLMK 197 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1084 

A DNA sequence (GBSxll59) was identified in S.agalactiae <SEQ ID 3343> which encodes the amino 
20 acid sequence <SEQ ID 3344>. This protein is predicted to be DNA-3-methyladenine glycosidase I (tag). 
Analysis of this protein sequence reveals the following: 

Possible site: 50 

>» Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm Certainty=0. 2812 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 10263> which encodes amino acid sequence <SEQ ID 
10264> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC76573 GB:AE000432 3 -methyl -adenine DNA glycosylase I, 
constitutive [Escherichia coli K12] 
35 Identities = 87/176 (49%) , Positives = 122/176 (68%) , Gaps = 1/176 (0%) 

Query: 5 MKRCSWVNLDNPLYVAYHDKEWGRAVHDDHvIjFELLCLETYQSGLSWETVIjNKRQEFRQV 64 

M+RC WV+ D PLY+AYHD EWG D LFE++CLE Q+GLSW TVL KR+ +R 
Sbjct: 1 MERCGWSQD-PLYIAYHDNEWGVPETDSKKLFEMICLEGQQAGLSWITVLKKRENYRAC 59 



Query: 65 FHHYNIEKVAAMSDADLEI ILQNPRVIRHRLKLFSTRQNARSI ILIQKEFGSFDRYIWSF 124 

FH ++ KVAAM + D+E ++Q+ +IRHR K+ + NAR+ + +++ F ++WSF 
Sbjct: 60 FHQFDPVWAAMQEEDVERLVQDAGIIRHRGKIQAIIGNARAYLQMEQNGEPFVDFVWSF 119 



45 Query: 125 TONKVQWSVNNYiroVPASTTLSERLSKDLKKRGFKFVGPTCLYSFIQAAGMVNDH 180 

V+++ QV +++P ST+ S+ LSK LKKRGFKFVG T YSF+QA G+VNDH 

Sbjct: 120 VNHQPQVTQATTLSEIPTSTSASDALSKALKKRGFKFVGTTICYSFMQACGLVNDH 175 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3345> which encodes the amino acid 
50 sequence <SEQ ID 3346>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial cytoplasm Certainty=0. 4149 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/184 (61%) , Positives = 135/184 (72%) 



Query: 


3 


FHMKRCSWVNLDNPLWAYHDKEWGRAVHDDHVLFELLCLETYQSGLSWETVLNKRQEFR 


62 






FHMKRCSWV DN LY YHD EWG+ + DD FELLCLE+iUSblibW ivlj KKy bR 




Sb j ct : 


2 


FHMKRCSWVPKDNQLYCDYHDLEWGQPLDDDRDFFEI.LCLESYQSGLSWLTVLKKRQAFR 


61 


Query: 


63 


QVFHHYNIEKVAAMSDADLEIILQNPRVIRHRLKLFSTRQNARSIILIQKEFGSFDRYIW 


122 






VFHHY+I VA + ++ L+NP +IRH+LKL +T NA ++ IQKEFGSF Y+W 




Sb j ct : 


62 


TOFHHYDIASVATFTSEEMADALENPSIIRHKLKIAATVNNAIAVQKIQKEFGSFSTYLW 


121 


Query: 


123 


SFTONKVQWSVNISrYNDVPASTTLSERLSKDLKlCRGFKFVGPTCLYSFIQARGMvTOHEN 


182 






+FV K N VN N VPA T LS RL+KDLKKRGFKF+GPT +YSF+QA+G+VNDHE 




Sb j ct : 


122 


NWGGKPINNLVWQENLVPAQTELSIRLAKDLKKRGFKFLGPTTVYSFMQASGLVNDHEE 


181 


Query: 


183 


ICDF 186 
C F 




Sbjct: 


182 


ACVF 185 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1085 

A DNA sequence (GBSxll60) was identified in S.agalactiae <SEQ ID 3347> which encodes the amino 
acid sequence <SEQ ID 3348>. This protein is predicted to be competence-damage inducible protein 
(cinA). Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have an uncleavable N-term signal seq 



A related GBS nucleic acid sequence <SEQ ID 10261> which encodes amino acid sequence <SEQ ID 
10262> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA84071 GB:Z34303 CinA protein [Streptococcus pneumoniae] 
Identities = 194/297 (65%) , Positives = 236/297 (79%) , Gaps = 1/297 (0%) 

Query: 1 IWEGSIPLQNLTGIAVGGIVTSKGVQYMVLPGPPSELKPMVMEQWPILSNNGTKLYSRV 60 

+VEG+IPL N TGLAVGG + GV Y+VLPGPPSELKPMV+ Q++P L G+KLYSRV 
Sbjct: 121 I VEGAIPLPIffiTGI^VGGKLEVDGvTYvVIjPGPPSELKPMvIiNQLLPKLMT-GSKLYSRV 179 

Query: 61 LRFFGIGESQLVTILEDIIKNQTDPTIAPYAKVGEVTLRLSTKAENQDEADFKLDSLEKE 120 

LRFFGIGESQLVTIL D+I NQ DPT+APYAK GEVTLRLSTKA +Q+EA+ LD LE + 
Sbjct: 180 LRFFGIGESQLVTILADLIDNQIDPTLAPYAKTGEvTLRLSTKASSQEEANQALDILENQ 239 

Query: 121 ILALKTLDITOKLKDLLYGYGDNNSMARTVIiELLKVQNKTITAAESLTAGLFQSQIjAEFSG 180 

IL +T + L+D YGYG+ S+A V+E LK Q KTI AAESLTAGLFQ+ +A FSG 
Sbjct: 240 ILDCQTFEGISLRDFCYGYGEETSLASIVVEELKRQGKTIAAAESLTAGLFQATVANFSG 299 

Query: 181 ASQVFNGGFTTYSMFAKSQLLGIPKZKLQEYGWSHFTAEAMAQQARQLIjKADFGIGLTG 240 

S +F GGF TYS+E KS++L IP K L+E+GWS FTA+ MA+QAR ++DFGI LTG 
Sbjct: 300 VSSIFEGGFVTYSLEEKSRMLDIPAKNLEEHGWSEFTAQKMAEQARSKTQSDFGISLTG 359 

Query: 241 VAGPDELEGYPAGTVFIGIATPEGVSSIKVSIGGKSRSDVRHISTLHAFDLVRRALL 297 

VAGPD LEG+P GTVFIG+A +G IKV+IGG+SR+DVRHI+ +HAF+LVR+ALL 
Sbjct: 360 VAGPDSLEGHPVGTVFIGIAQDQGTEVIKVNIGGRSRADTOHIAVMHAFNLVRKALL 416 



Final Results 



bacterial membrane Certainty^O . 0000 (Not 

bacterial outside — Certainty=0 . 0000 (Not 
bacterial cytoplasm — Certainty=0 . 0000 (Not 



Clear) < suco 
Clear) < suco 
Clear) < suco 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3349> which encodes the amino acid 
sequence <SEQ ID 3350>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

5 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.91 Transmembrane 134 - 150 ( 134 - 150) 

Final Results 

10 bacterial membrane Certainty=0. 1765 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

15 >GP:CAA84071 GB:Z34303 CinA protein [Streptococcus pneumoniae] 

Identities = 286/417 (68%) , Positives = 336/417 (79%) , Gaps = 1/417 (0%) 

Query: 1 MKAELIAVGTEILTGQI vNTNAQFLSEKMAELGIDVYFQTAVGDNEERLLSVITTASQRS 60 
MKAE+IAVGTEILTGQIVNTNAQFLSEK+AE+G+DVYFQTAVGDNE RLLS++ ASQRS 
20 Sbjct: 1 MKAEIIAVGTEILTGQIVNTNAQFLSEKLAEIGVDVYFQTAVGDNEVRLLSLLEIASQRS 60 

Query: 61 NLVILCGGLGPTKDDLTKQTLAKYLRKDLVYDEQACQKLDDFFAKRKPSSRTPNNERQAQ 120 

+LVIL GGLG T+DDLTKQTLAK+L K LV+D QA +KLD FFA R +RTPNNERQAQ 
Sbjct: 61 SLVILTGGLGATEDDLTKQTIAKFLGKALVFDPQAQEKLDIFFALRPDYARTPNNERQAQ 120 

25 

Query: 121 VIEGSIPLPNKTGIAVGGFITVDGISYVVLPGPPSELKPMVNEELVPLLSKQySTLYSKV 180 

++EG+IPLPN+TGLAVGG + VDG++YWLPGPPSELKPMV +L+P L S LYS+V 
Sbjct: 121 IVEGAIPLPNETGIjAVGGKLEvDGVTYVvLPGPPSELKPMVLNQLLPKLMTG-SKLYSRV 179 

30 Query: 181 LRFFGIGESQLVTvLSDFIENQTDPTIAPYAKTGEOTLRLSTKTENQAIjADKKLGQLEAQ 240 

LRFFGIGESQLVT+L+D I+NQ DPT+APYAKTGEVTIiRLSTK +Q A++ L LE Q 
Sbjct: 180 LRFFGIGESQLWIIiADLIDNQIDPTIAPYAKTGEVTIjRLSTKASSQEFANQALDILENQ 239 

Query: 241 LLSRKTLEGQPLADVFYGYGEDNSLARETFEIjLvKYDKTITAAESLTAGLFQSTLASFPG 300 
35 +L +T EG L D YGYGE+ SLA E L + KTI AAESLTAGLFQ+T+A+F G 

Sbjct: 240 ILDCQTFEGISLRDFCYGYGEETSIASIVVEELKRQGKTIAAAESLTAGLFQATVANFSG 299 

Query: 301 ASQVFNGGFVTYSMEEKAKMLGLPLEELKSHGWSAYTAEGMAEQARLLTGADIGVSLTG 360 
S +F GGFVTYS+EEK++ML +P + L+ HGWS +TA+ MAEQAR T +D G+SLTG 
40 Sbjct: 300 VSSIFEGGFVTYSLEEKSRMLDIPAKNLEEHGWSEFTAQKMAEQARSKTQSDFGISLTG 359 

Query: 361 VAGPDMLEEQPAGTVFIGLATQNKVESIKVLISGRSRLDTOYIATLHAFNMVRKTLL 417 

VAGPD LE P GTVFIGLA E IKV I GRSR DVR+IA +HAFN+VRK LL 

Sbjct: 360 VAGPDSLEGHPVGTVFIGLAQDQGTEVIKVNIGGRSRADvRHIAVMHAFNLVRKALL 416 

45 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 201/299 (67%) , Positives = 242/299 (80%) 

Query: 1 MVEGSIPLQNLTGI^VGGIVTSKGVQYMVLPGPPSELKPMVMEQWPILSNNGTKLYSRV 60 
50 ++EGSIPL N TGLAVGG +T G+ Y+VLPGPPSELKPMV E++VP+LS + LYS+V 

Sbjct: 121 VIEGSIPLPNKTGLAVGGFITVDGISYWLPGPPSELKPMVNEELVPLLSKQYSTLYSKV 180 

Query: 61 LRFFGIGESQLVTILEDIIKNQTDPTIAPYAKVGEVTLRLSTKAENQDEADFKLDSLEKE 120 
LRFFGIGESQLVT+L D I+NQTDPTIAPYAK GEVTLRLSTK ENQ AD KL LE + 
55 Sbjct: 181 LRFFGIGESQLVTVLSDFIF^QTDPTIAPYAKTGEVTLRLSTKTENQALADKKLGQLEAQ 240 

Query: 121 ILALKTLDNRKLIODLLYGYGDNNSMARTVLELLKVQNKTITAAESLTAGLFQSQLAEFSG 180 

+L+ KTL+ + L D+ YGYG+ +NS+AR ELL +KTITAAESLTAGLFQS LA F G 
Sbjct: 241 LLSRKTLEGQPLADVFYGYGEDNSLARETFELLVKYDKTITAAESLTAGLFQSTLASFPG 300 



60 



Query: 181 ASQVFNGGFTTYSMFJUCSQLLGIPKKKLQEYGWSHFTAEAMAQQARQLLKADFGIGLTG 240 

ASQVFNGGF TYSME K+++LG+P ++L+ +GWS +TAE MA+QAR L AD G+ LTG 
Sbjct: 301 ASQVFNGGFOTYSMEEKAKMLGLPLEELKSHGWSAYTAEGMAEQARLLTGADIGVSLTG 360 
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Query: 241 VAGPDELEGYPAGTVFIGIATPEGVSSIKVSIGGKSRSDVRHISTLHAFDLVRRALLKI 299 

VAGPD LE PAGTVFIG+AT V SIKV I G+SR DVR+ 1 +TLHAF+ +VR+ LLK+ 
Sbjct: 361 VAGPDMLEEQPAGWFIGIATQNKVESIKVLISGRSRLDVRYIATLHAFNMVRKTLLKL 419 

5 SEQ ID 3348 (GBS646) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 131 (lane 2-4; MW 61.6kDa), in Figure 134 (lane 3; MW 57.5kDa + lanes 2 & 4; 
MW 27kDa). It was also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 131 (lane 5-7; MW 36.6kDa) and in Figure 178 (lane 5; MW 37kDa). 

GBS646-His was purified as shown in Figure 229, lane 5. 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1086 

A DNA sequence (GBSxll61) was identified in S.agalactiae <SEQ ID 3351> which encodes the amino 
acid sequence <SEQ ID 3352>. Analysis of this protein sequence reveals the following: 

15 Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 148 - 164 ( 148 - 164) 

Final Results 

20 bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < succ> 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3353> which encodes the amino acid 

25 sequence <SEQ ID 3354>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 148 - 164 ( 148 - 164) 

30 Final Results 

bacterial membrane — Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the databases: 

>GP:AAD04860 GB:AF069745 RecA protein [Streptococcus parasanguinis] 
Identities = 333/381 (87%) , Positives = 356/381 (93%) , Gaps = 3/381 (0%) 

Query: 1 LAKKIjKKNEEITKKFGDERRKALDDALKNIEKDFGKGAvMRLGERAEQKVQvMSSGSLAL 60 
40 +AKK KK ++ ITKKFGDER KAL+DALK IEKDFGKG++MRLGERAEQKVQVMSSGSLAL 

Sbjct: 1 MAKKQKKLDDITKKFGDEREKALNDALKLIEKDFGKGSIMRLGERAEQKVQVMSSGSLAL 60 

Query: 61 DIALGAGGYPKGRIIEIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 120 
DIALGAGGYPKGRIIEIYGPESSGKTTvALHAVAQAQKEGGIAAFIDAEHALDP+YAAAL 
45 Sbjct: 61 DIALGAGGYPKGRIIEIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPSYAAAL 120 

Query: 121 GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLVWDSVAALVPRAEIDGDIGDSHVGLQ 180 

GVNIDELLLSQPDSGEQGLEIAGKLIDSGAvDLVWDSVAALVPRAEIDGDIGDSHVGLQ 
Sbjct: 121 GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLVWDSVAALVPRAEIDGDIGDSHVGLQ 180 

50 

Query: 181 ARMMSQAMRKLSASINKTKTIAIFINQIiREKVGVMFGNPETTPGGRALKFYASVRLDVRG 240 

ARMMSQAMRKL ASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFYASVRLDVRG 
Sbjct: 181 ARMMSQAMRKLGASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFYASVRLDVRG 240 

55 Query: 241 TTQIKGTGDQKDSSIGKETKIKWKNKVAPPFKVAEVEIMYGEGISRTGELVKIASDLDI 300 

TQIKGTGDQKD+++GKETKIKWKNKVAPPFK A VEIMYGEGISRTGELVKIA+DLDI 
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Sbjct: 241 OTQIKGTGDQKDTNVGKETKIKWKNKVAPPFKEJJWEIMYGEGISRTGELVKIATDLDI 300 

Query: 301 IQKAGAWFSYNGEKIGQGSENAKRYLADHPELFDEIDLKVRVKFGLLEESEEESAMAVAS 360 

IQKAGAW+SYNGEKIGQGSENAK++LADHPE+FDEID KVRV FGL+E+ E ++ 
Sbjct: 301 IQKAGAWYSYNGEKIGQGSENAKKFLADHPEIFDEIDHKVRVHFGLIEKDEAVKSLDKTE 360 

Query: 361 EE TDDLALDLDNGIEIED 378 

E +++ LDLD+ IEIED 

Sbjct: 361 EAAP WEEVTLDLDDAIE I ED 381 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 339/379 (89%) , Positives = 356/379 (93%) , Gaps = 1/379 (0%) 



Query : 


1 


MAKKTKKAEE ITKKFGDERRKALDDALKNI E KD FGKGA VMRLGERAEQKVQVMSSGSLAL 


60 






+AKK KK EEITKKFGDERRKALDDALKNIEKDFGKGAVMRLGERAEQKVQVMSSGSLALi 




Sbjct: 


1 


LAKXLKKNEEITKKFGDERRKALDDALKNIEKDFGKGAVMRLGERAEQKVQVMSSGSLAL 


60 


Query: 


61 


DIALGAGGYPKGRIVEIYGPESSGKTWALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 


120 






DIALGAGGYPKGRI+EIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 




Sbjct: 


61 


DIALGAGGYPKGRIIEIYGPESSGKTTVALHAVAQAQKEGGIAAFIDAEHALDPAYAAAL 


120 


Query: 


121 


GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLWVDSVAALVPRAEIDGDIGDSHVGLQ 


180 






GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLVWDSVAALVPRAEIDGDIGDSHVGLQ 




Sbjct: 


121 


GVNIDELLLSQPDSGEQGLEIAGKLIDSGAVDLVWDSVAALVPRAEIDGDIGDSHVGLQ 


180 


Query: 


181 


ARMMSQAMRKLSASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFYSSVRLDVRG 


240 






ARMMSQAMRKLSASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFY+SVRLDVRG 




Sbjct: 


181 


ARMMSQAMRKLSASINKTKTIAIFINQLREKVGVMFGNPETTPGGRALKFYASVRLDVRG 


240 


Query: 


241 


NTQI KGTGEHKDHNVGKETKI KWKNKVAPPFREAFVE IMYGEGI SRTGELI KI ASDLDI 


300 






TQ1KGTG+ KD + +GKETKI KVVKNKVAPPF+ A VEIMYGEGISRTGEL+KIASDLDI 




Sbjct: 


241 


TTQIKGTGDQKDSSIGKETKIKVVTCNKWAPPFKVAEVEIMYGEGISRTGELVKIASDLDI 


300 


Query. 


301 


IQKAGAWYSYWBFJ^GQGSEtlAKKXIiADNPAIFDEIDHKWvHFGMTEDDSPVQSELWE 


360 






IQKAGAW+SYNGEKIGQGSENAK+YLAD+P +FDEID KVRV FG+ E +S +S + 




Sbjct: 


301 


IQKAGAWFSYNGEKIGQGSENAKRYLADHPELFDE I DLKVRVKFGLLE - ESEEESAMAVA 


359 


Query: 


361 


KNEADDLVLDLDNAIEIEE 379 








E DDL LDLDN IEIE+ 




Sbjct: 


360 


SEETDDLALDLDNGIEIED 378 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1087 

A DNA sequence (GBSxll62) was identified in S.agalactiae <SEQ ID 3355> which encodes the amino 
acid sequence <SEQ ID 3356>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2344 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10259> which encodes amino acid sequence <SEQ ID 
1026O was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG37358 GB:AF028804 NrpR [Lactococcus lactis subsp. cremoris] 
Identities = 69/132 (52%) , Positives = 102/132 (77%) 
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. Query: 5 MIKIYTISSCTSCKKAKTWLNAHQLPYKEQNLGKESLTRDEILEILTKTESGIESIVSSK 64 
MI IYT SCTSCKKAKTWL+ H +P+ E+NL + L+ EI +IL K + G+E ++SS+ 
Sbjct: 1 MITIYTAPSCTSCKKAKTWLSYHHIPFNERNLIADPLSTTEISQILQKCDDGVEGLISSR 60 

5 

Query: 65 mYAKALNCNIEELSVNEVIDLIQENPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 124 

NR+ K L + E++S+++ I +1 ENP+I++ PI++D+KRL VGY E++IRAFLPR++R 
Sbjct: 61 NRFVKTLGVDFEDISLSQAIKIISENPQIMRRPIIMDEKRLHVGYNEEEIRAFLPRTVRV 120 

10 Query: 125 VENAEARLRAAL 136 

+EN ARLR+A+ 
Sbjct: 121 LENGGARLRSAI 132 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3357> which encodes the amino acid 
15 sequence <SEQ ID 3358>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 2569 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

25 Identities = 117/132 (88%), Positives = 128/132 (96%) 

Query: 5 MIKIYTISSCTSCKiayCrWIiNAHQLPYKEQNLGKESLTRDEILEILTKTESGIESIVSSK 64 

MIKIYTISSCTSCKKAKTWLNAH+L YKEQNLGKE LT++EIL IL+KTE+G+ESIVSSK 
Sbjct: 1 MIKIYTISSCTSCKKAKTWLNAHKIAYKECjNIjGKEPLTKEEILAILSIOTENGVESIVSSK 60 

30 

Query: 65 NRYAKALNCNIEELSVNEVIDLIQENPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 124 

NRYAKAL+C+IEELSV+EVIDLIQ+NPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 
Sbjct: 61 NRYAKALDCDIEELSVSEVIDLIQDNPRILKSPILIDDKRLQVGYKEDDIRAFLPRSIRN 120 

35 Query: 125 VENAEARLRAAL 136 

+EN EARLRAAL 
Sbjct: 121 IENTEARLRAAL 132 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
40 vaccines or diagnostics. 

Example 1088 

A DNA sequence (GBSxll63) was identified in S.agalactiae <SEQ ID 3359> which encodes the amino 
acid sequence <SEQ ID 3360>. Analysis of this protein sequence reveals the following: 

Possible site: 49 
45 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3097 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04987 GB:AP001511 unknown [Bacillus halodurans] 
Identities = 49/82 (59%) , Positives = 64/82 (77%) , Gaps = 1/82 (1%) 

Query: 1 MGFTDETVRFRLDDSN-KVEISETLTAVYRSLEEKGYNPINQIVGYVLSGDPAYVPRYND 59 

M D T++F +++ V++ E L +VY +LEEKGYNPINQIVGY+LSGDPAY+PR+ D 
Sbjct: 1 MSSMDNTMKFNVNEEPVSVDVQEVLMSvYEALEEKGYNPINQIVGYLLSGDPAYIPRHKD 60 



55 
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Query: 60 ARNQIRKYERDEIVEELVRYYL 81 

AR IRK ERDE+ +EELV+ YL 
Sbjct: 61 ARTLIRKLERDELIEELVKSYL 82 

5 A related DNA sequence was identified in S.pyogenes <SEQ ID 336 1> which encodes the amino acid 
sequence <SEQ ID 3362>. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>» Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 3 097 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 An alignment of the GAS and GBS proteins is shown below. 

Identities = 80/88 (90%) , Positives = 85/88 (95%) 

Query: 1 MGFTDETVRFRLDDSNKVEISETLTAVYRSLEEKGYNPINQIVGYVLSGDPAYVPRYNDA 60 
MGFTDETVRF+LDD +K +ISETLTAVY SL+EKGYNPINQIVGYVLSGDPAYVPRYNDA 
20 Sbjct: 1 MGFTDETVRFKLDDGDKRQISETLTAVYHSLDEKGYNPINQIVGYVLSGDPAYVPRYNDA 60 

Query: 61 RNQIRKYERDEIVEELVRYYLQGNGIDL 88 

RNQIRKYERDEI VEELVRYYLQGNGID+ 
Sbjct: 61 RNQIRKYERDEIVEELVRYYLQGNGIDV 88 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1089 

A DNA sequence (GBSxll64) was identified in S.agalactiae <SEQ ID 3363> which encodes the amino 
30 acid sequence <SEQ ID 3364>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm — Certainty=0 . 1575 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10257> which encodes amino acid sequence <SEQ ID 
40 10258> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14698 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 82/138 (59%) , Positives = 109/138 (78%) , Gaps = 1/138 (0%) 

45 Query: 1 MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEESGNFGFDRLAELVKEYKVDKFWG 60 

MRI+GLD+G+KT+GVA+SD +G+TAQG+E IKI+E G++G RL+EL+K+Y +DK V+G 
Sbjct: 1 MRILGLDLGTKTLGVALSDEMGWTAQGIETIKINEAEGDYGLSRLSELIKDYTIDKIVLG 60 

Query: 61 LPKNMNNTSGPRVFASQAYGDKITELFNLPVEYQDERLTTVQAERMLVEQADISRGKRKK 120 
50 PKNMN T GPR EASQ + + +N+PV DERLTT+ AE+ML+ AD+SR KRKK 

Sbjct: 61 FPKNTWGWGPRGEASQTFAKVLETTYNVPvVLTOERLTTMAAEKMLI-AADVSRQKRKK 119 

Query: 121 VIDKLAAQLILQNYLDRM 138 
VIDK+AA +ILQ YLD + 
55 Sbjct: 120 VIDKMAAVMILQGYLDSL 137 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 3365> which encodes the amino acid 
sequence <SEQ ID 3366>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1575 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/139 (82%) , Positives = 126/139 (90%) 

Query: 1 MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEESGNFGFDRLAELVKEYKVDKFWG 60 

MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEE FGF RL EL VK+Y+V+ + FV+G 
Sbjct: 1 MRIMGLDVGSKTVGVAISDPLGFTAQGLEIIKIDEEKAEFGFTRLEELVKQYQVEQFVIG 60 

Query: 61 LPKNMNNTSGPRVEASQAYGDKITELFNLPVEYQDERLTTOQAERMLVEQADISRGKRKK 120 

LPKNMNNT+GPRV+AS YG+ I LF LPV YQDERLTTV+A+RML+EQADISRGKRKK 
Sbjct: 61 LPKNMNNTNGPRTOASITYGNHIEHLFGLPVHYQDERLTTVEAKRMLIEQADISRGKRKK 120 

Query: 121 VIDKLAAQLILQNYLDRMF 139 

VIDKLAAQLILQNYL+R F 
Sbjct: 121 VI DKLAAQL I LQNYLNRNF 139 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1090 

A DNA sequence (GBSxll65) was identified in S.agalactiae <SEQ ID 3367> which encodes the amino 
acid sequence <SEQ ID 3368>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2631 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB14697 GB:Z99118 yrzB [Bacillus subtilis] 
Identities = 50/94 (53%) , Positives = 65/94 (68%) , Gaps = 5/94 (5%) 

Query: 12 EHQHEVITLVDENGNETLFEILLTIDGREEFGKNYVLLVPAGAEEDEQGEIEIQAYSFTE 71 

EH + IT+VD+ GNE L E+L T + EEFGK+YVL P +++DE E+EI A SFT 
Sbjct: 2 EHGEKNITIVDDQGNEQLCEVLFTFEN-EEFGKSYVLYYPIESKDDE- -EVEILASSFTP 58 

Query: 72 NADGTEGDLQPIPEDSDAEWDMIEEVFNSFLDEE 105 

N DG G+L PI ++D EWDMIEE N+FL +E 
Sbjct: 59 NEDGENGELFPI - -ETDEEWDMIEETLNTFLADE 90 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3369> which encodes the amino acid 
sequence <SEQ ID 3370>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3170 (Affirmative) < suco 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/98 (91%) , Positives = 94/98 (95%) 

Query: 7 HDHNHEHQHEVITLVDENGNETLFEILLTirXSREEFGKNYVLLVPAGAEEDECGEIEIQA 66 

H+H ++HQHEVITLVDE GNETLFE I LLTIDGREEFGKNYVLLVPAG+EEDE GEIEIQA 
Sbjct: 3 HNHENDHQHEVITLVDEQGNETLFE I LLTIDGREEFGKNYVLLVPAGSEEDESGEI E I QA 62 

Query: 67 YSFTENADGTEGDLQPIPEDSDAEWDMIEEVFNSFLDE 104 

YSFTEN DGTEGDLQPIPEDSDAEWDMIEEVFNSFLDE 
Sbjct: 63 YSFTENEDGTEGDLQPIPEDSDAEWDMIEEVFNSFLDE 100 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1091 

A DNA sequence (GBSxll66) was identified in S.agalactiae <SEQ ID 337 1> which encodes the amino 
acid sequence <SEQ ID 3372>. Analysis of this protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N- terminal signal sequence 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1092 

A DNA sequence (GBSxll67) was identified in S.agalactiae <SEQ ID 3373> which encodes the amino 
acid sequence <SEQ ID 3374>. This protein is predicted to be unnamed protein product. Analysis of this 
protein sequence reveals the following: 

Possible site: 53 

»> Seems to have no N- terminal signal sequence 
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Final Results 



bacterial cytoplasm Certainty=0 . 2059 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 4673 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10255> which encodes amino acid sequence <SEQ ID 
10256> was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3375> which encodes the amino acid 
sequence <SEQ ID 3376>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 .3951 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9179> which encodes the amino acid sequence 

<SEQ ID 9180>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 13 
>» Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0. 395 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 200/480 (41%) , Positives = 310/480 (63%) , Gaps = 1/480 (0%) 

Query: 40 ILLYSVLSTLLAIANPLLTYFANGLQTQNLYTGLMMTKGQIPYSDVFATGGFLYYVTIAL 99 

+L +S++ + L IA P LT ANGLQ+QNLY G+M+TKGQ+PYS F TGG Y+V IAL 
Sbjct: 40 LLFFSIIISSLTIAVPFLTDAANGLQSQNLYIGMMLTKGQLPYSAAFTTGGLFYFVIIAL 99 

Query: 100 SYLLGSSIWLLIVQFIAYYVSGIYFYKLVYYVAQSEIVSIGMTLIFYIMNIVLGFGGMYP 159 

SY LGS++WL+ VQ +Y+SG+Y YKL+ Y+ + V++ ++ +Y++++ LGFGG+YP 
Sbjct: 100 SYYLGSTLWLVFVQVFCFYLSGLYLYKLINYMTGFQKVALTFSISYYLLSVSLGFGGLYP 159 

Query: 160 IQWALPFMLISLWFLIKFCVDNIVDEAFIFYGILAAFSLFIDPQTLIFWLCSFVLLTATN 219 

Q A+PF+LIS WFL K+ + DEAFI +G + A ++ IDP TLIFW + V + + N 
Sbjct: 160 TQLAMPF I LI SAWFLTKY FACLVKDEAF I L FGFVGALAMLI DPSTLI FWS FACVTVFS YN 219 

Query: 220 IKQKQSLRGFYQFLCWFGMILIAYTVGYFMFNLQIISSYIDKAIFYPFTYFARTNHSFL 279 

I QK RGFYQ L +FGMIL+ YT GYF+ NLQ+++ Y+ + + YPFT+F N S L 
Sbjct: 220 ISQKHLARGFYQLLASIFGMILVFYTAGYFILNLQvLNPYLSQTMIYPFTFFKSGNLSLL 279 

Query: 280 LSLAIQIWLLGSGCLFGLWDFIQNRKKASYQIGLNFIACIFIIYAIMAIFSRDFNLYHF 339 

LAIQ+ LG G L G+ + 1+ K S ++ + + ++AIFS+D+ YH 

Sbjct: 280 FGLAIQLFFALGLGLLTG^NVIRRFKNNSDRVVKWLFVMVILESILVAIFSQDYRPYHL 339 
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Query: 340 LPALPFGLLLTSNKITILYQKVIDRRSHRRQY-FSGKSLIVDLFVKKTYYLPLLLVSLSI 398 

LP LPFGL+LT+ + Y + + SHRR++ +G ++ +++K+ +YLP+L+V + 
Sbjct: 340 LPLLPFGLILTAIPVGYQYGIGLGQSSHRRRHGKNGVGRVMMIYLKRHFYLPIL1VGTIL 399 

Query: 399 GLLVYOTYQNVTLSKERRDISHYLTTKIDRDGKIYVWDKVASIYSQTRLKSASQFVLPHI 458 

Y ++ L++ER 1+ YL K+++ IYVWD + IY ++ KS SQF P I 
Sbjct: 400 ICSTYCFISSIPLNQERDHIASYLEQKLNKTQSIYVWDDTSKIYLDSKAKSVSQFSSPDI 459 

Query: 459 OTAQKHNEKILKDELLQHGAKYFIIiNKlffiKLPNELKSDIKKHYQEVPLSNITHFVLYRFK 518 



Sbjct: 460 OTQKESHRKILEDELLENKAAYIVVmYKNLPKIIQKVLSTNYKVDKQITTKSFlVYQKK 519 

A related GBS gene <SEQ ID 8727> and protein <SEQ ID 8728> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 0 
SRCFLG: 0 

McG: Length of UR: 34 

Peak Value of UR: 2.23 

Net Charge of CR: 0 
McG: Discrim Score: 7.72 
GvH: Signal Score (-7.5): -2.21 

Possible site: 60 
>>> Seems to have a cleavable N-term signal seq. 
Amino Acid Composition: calculated from 61 
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modified ALOM score: 2.34 
icml HYPID: 7 CFP: 0.467 

*** Reasoning Step: 3 



The protein has homology with the following sequences in the databases: 

ORF02392(331 - 978 of 1764) 

EGAD|43696|mJ1079(2 - 379 of 397) conserved hypothetical protein {Methanococcus jannaschii} 
OMNl|MJ1079 conserved hypothetical protein GP| 1591727 |gb|AAB99076 . 1 1 |U67550 conserved 
hypothetical protein {Methanococcus jannaschii} PIR| F64434 | F64434 hypothetical protein 
MJ1079 - Methanococcus jannaschii 
%Match =3.1 

%Identity =25.6 %Similarity =50.7 

Matches = 57 Mismatches = 100 Conservative Sub.s = 56 

174 204 234 264 294 324 354 

*LLIiANI*LSVHPTSFFTXXXN*LXXSSIWLLIVQFIAYWSGIYFYKLVYYVAQSEIVSIGMTLIFYIMNIVLG 



NT ++++ KIL+DELL++ A Y ++N+ + LP ++ + +Y+ 



F++Y+ K 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0 .4673 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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YQFLCWFGMILIAYTVGYFMFNLQI ISSYIDKAIFYPFTYFARTNHSFLLSLAI -QI WLLGSGC 

=1 III :::=:: |::: | III I : = = == :::|:« II 

GSYLGWFSILISLFLMSILHFDVRAFYCSI--KIFIPFILIAFILYQIFTAKSVWEVLVIFLSGIFGIAVLYCSEAFNI 
110 120 130 140 150 160 170 

5 

798 828 846 876 

LFGLWDFIQNRKKASYQ- - - - IGLNFIACIFI 

:||: : | | | : : : |:: || 

TLTAIFTGMFGIPLLINNLKTYKIKSQMMAFPDFELKFLKSSFFA TIAIIILLNLSKYILLFIRKVNFKFLSLFFI 

10 190 200 210 220 320 330 

906 948 978 1008 1038 1068 1098 

IYAIMAIFSRDFN- - -LYH- - - FLPALPFGLLLTSNKITILYQKVIDRRSHRRQYFSGKSLIVDLFVKKTYYLPLLLVSL 
|: : : :| :|| :| |: ||| : : 

15 IFCSLWIIGSYNTYLIYHIIVYLTAIYIGLLAVKSNTNLSNMMNVLIFPTILYFLRG 
350 360 370 380 390 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

20 Example 1093 

A DNA sequence (GBSxll68) was identified in S.agalactiae <SEQ ID 3377> which encodes the amino 

acid sequence <SEQ ID 3378>. This protein is predicted to be anaerobic ribonucleotide reductase (nrdD). 

Analysis of this protein sequence reveals the following: 

Possible site: 52 
25 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3722 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10253> which encodes amino acid sequence <SEQ ID 
10254> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:AAD00215 GB:U73336 anaerobic ribonucleotide reductase 

[Lactococcus lactis subsp. cretnoris] 
Identities = 539/725 (74%) , Positives = 616/725 (84%) , Gaps = 7/725 (0%) 

Query: 10 MTESDIKVIKRDGRLVSFDKYKIYTALLKASNKVIKMSPLVEAKLEMIADHVIAEIYNRF 69 
40 +T +1 VIKRDGR V F+ KI+ AL KA+ KV V L + D V++EI++RF 

Sbjct: 10 VTLEE INVI KRDGRSVKFNSEKI FDALTKAAKKVELTDKSV LSELTDRWSE I FSRF 66 

Query: 70 KDNIKIYEIQNIVEHKLLEANEYAIAQEYINYRTQRDFERSQATDINFSIGKLINKDQTV 129 
+N+KIYEIQ+IVE +LLE+ E A+A+EYI+YR RD R++ATDINF+I KLIN+DQTV 
45 Sbjct: 67 SENVKIYEIQSIVEQELLESGETAIAEEYISYRANRDLARTKATDINFTIEKLINRDQTV 126 

Query: 130 VNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLDYSPYTPMTN 189 

VNENANKDS+VFNTQRDLTAG V K+IGLK+LP HVANAHQKGDIHYHDLDYSP+T M N 
Sbjct: 127 VNENANKDSNVFNTQRDLTAGAVSKAIGLKLLPPHVANAHQKGDIHYHDLDYSPFTTMAN 186 

50 

Query: 190 CCLIDFKGMIiANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTADRIDEFLAP 249 

CCLIDFK M NGFK+GNA+V+SPKSIQTATAQ SQIIANVASSQYGGC+ DR DE LAP 
Sbjct: 187 CCLIDFKJMFFjNGFKLGNAQVDSPKSIQTATAQASQIIANVASSQYGGCSFDRADEVLAP 246 

55 Query: 250 YAQLNYQKHLKDAKEWVIED-KQEDYARAKTQKDIYDAMQSLEYEINTLFTSNGQTPFTS 308 

YA+LNYQKHLKDA++W+ D K+E YAR KT KDIYDAMQSLEYEINTLFTSNGQTPF + 
Sbjct: 247 YAKIOTQKHLKDAQKWIDGDEKRFAYAREKTAKD1YDAMQSLEYEINTLFTSNGQTPFVT 306 



Query: 309 LGFGLGTNWFEREIQKAILKIRIQGLGSEHRTAIFPKLIFTLKKGI.NLEEDSPNYDIKQL 368 
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+GFGLG +W+ REIQKAILK+RI GLGSEHRTAI FPKLI FTLK+GLNLE +PNYDIK+L 
Sbjct: 307 VGFGLGDDWYARE IQKAI LKVRIGGLGSEHRTAI FPKLI FTLKRGLNLE VGTPNYDI KEL 366 

Query: 369 ALEC^TKRMYPDVLSYDKIIDLTGSFKAPMGCRSFLOGWRDANGQDVTSGR^LGVVTVN 428 
5 ALEC+TKRMYPD+LSYDKI++LTGSFKA MGCRSFLQGW+DANG DVT+GR NLGWTVN 

Sbjct: 367 ALECSTKRMyPDILSYDKIVELTGSFKASMGCRSFLQGWKDANGNDOTAGRMSILGVVTVN 426 

Query: 429 LPRVAMESNGDMDKFWE I FNERMS I ARDAL VYRVERVKEAI PANAPI LYQYGAFGERLGK 488 
LPR+A+E+ G+ +KFWEIFNER+ IA DAL +RVER KEA P NAPIL+ GA G RL 
10 Sbjct: 427 LPRI ALEAAGNKEKFWE I FNERVE IAHDALAFRVERAKEAQPKNAPI LFMNGALG - RLDS 485 

Query: 489 YDNVDRLFNHRRAWSLGYIGLYEVASVFYGGDWFJ3NHQAKAFTVDITOKMKQLCADWSD 548 

+VD L+N+ RATVSLGYIGLYEVA+ FYG WE N +AKAFT++IV++M + C DWS 
Sbjct: 486 EGSVDDLYMNERATVSLGYIGLYEVATTFYGPTWESNPEAKAFTIEIVKRMHEDCEDWSK 545 

15 

Query: 549 EYDYHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRKNPTPFEKLDF 608 

YH+SVYSTPSESLTDRFCR+D EKFG V DITDK+YYTNSFHYDVRKNPTPFEKL+F 
Sbjct: 546 ASGYHYSVYSTPSESLTDRFCRMDKEKFGSVADITDKDYYTNSFHYDVRKNPTPFEKLEF 605 

20 Query: 609 EKIYPETGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTPIDKCYQCQFEGD 668 

EK YP A+GGFIHYCEYPVLQQNPKALEAVWD+AYDR+GYLGTN PID CY C FEGD 
Sbjct: 606 EKDYP-VYANGGFIHYCEYPVLQQNPKALEAVWDFAYDRIGYLGTNAPIDHCYACGFEGD 664 

Query: 669 FTPTDRGFTCPNCGNSDPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVKHMNGS-SI 727 
25 FTPT+RGF CP CGN DPKT DWKRTCGYLGNPQARPMV+GRHKEIS+RVKHMNGS 

Sbjct: 665 FTPTERGFKCPQCGNDDPKTCDWKRTCGYLGNPQARPMVHGRHKEISSRVKHMMGSVGA 724 

Query: 728 KNQGN 732 
N GN 

30 Sbjct: 725 LNDGN 729 

A related DNA sequence was identified in S. pyogenes <SEQ ID 3379> which encodes the amino acid 
sequence <SEQ ID 3380>. Analysis of this protein sequence reveals the following: 

Possible site: 52 
35 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2975 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 641/731 (87%) , Positives = 680/731 (92%) 

45 Query: 1 MMVLERERFMTESDI KVI KRDGRLVS FDKYKI YTALLKASNKVI KMS PLVEAKLEMIADH 60 

M+ LE ++ + DIKVIKRDGRLV+FD KIY+ALLKAS KV +MS PLVEAKLE I+D 
Sbjct: 1 MVSLEEDKVTVQPDIKVIKRDGRLVNFDSTKIYSALLKASMKVTRMSPLvEAKLEAISDR 60 

Query: 61 VIAEIYNRFKDNIKIYEIQNIVEHKLLEANEYAIAQEYINYRTQRDFERSQATDINFSIG 120 
50 +IAEI RF NIKIYEIQNIVEHKLL ANEYAIA+EYINYRTQRDF RSQATDINFSI 

Sbjct: 61 IIAEIIERFPTNIKIYEIQNIVEHKLLAANEYAIAKEYINYRTQRDFARSQATDINFSID 120 

Query: 121 KLINKDQTVVNENANKIISDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLD 180 
KLINKDQTVVNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLD 
55 Sbjct: 121 KLINKDQTvVNENANKDSDVFNTQRDLTAGIVGKSIGLKMLPSHVANAHQKGDIHYHDLD 180 

Query: 181 YSPYTPMTNCCLIDFKGMLANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTA 240 

YSPYTPMTNCCLIDFKGMLANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTA 
Sbjct: 181 YSPYTPMTNCCLIDFKGMLANGFKIGNAEVESPKSIQTATAQISQIIANVASSQYGGCTA 240 

60 

Query: 241 DRIDEFIAPYAQLNYQKHLKDAKEWVIEDKQEDYARAKTQKDIYDAMQSLEYEINTLFTS 300 

DRIDEFLAPYA+LN++KH+ DAK+W++E K+E YA KTQKDIYDAMQSLEYEINTLFTS 
Sbjct: 241 DRIDEFLAPYAELNFKKHMADAKKOTI vETKRESYAFEKTQKDIYDAMQSLEYEINTLFTS 300 

65 Query: 301 NGQTPFTSLGFGLGTNWFEREIQKAILKIRIQGLGSEHRTAIFPKLIFTLKKGLNLEEDS 360 
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Sbjct: 


301 


NGQTPFTSLGFGLGTSWFEREIQKAILTIRIHGLGSEHRTAIFPKLIFTVKRGLNLEPDS 


360 


Query: 


361 


PNYDIKQIALECATKRMyPDVLSYDKIIDLTGSFKAPMG(^ 


420 










Sb j ct : 


361 


PNYDI KTLALECATKRMYPDMLSYDKI IDLTGSFKSPMGCRS FLQGWKDENGQDVTSGRM 


420 


Query: 


421 


NLGVTOTOLPRVAMESNGDMDKFWEIFNERMSIU^ 


480 






NT r i \Arra.MT.DT?a.7iMT?QMr , nMnif T7UTTTa.l7T\TTrPM T -La-DAT. 4-VPT7T7P\J r FA PATtf A P T T .VnYfJ 




Sbj ct : 


421 


j^GVVTLNLPRIAMESNGDMDKFW^ 


480 


Query: 


481 


AFGERLGKYDNVDRLITMRRATVSLGYIGLY^ 


540 






AT7Pj.PT, K" rtfWj- T.P j_PP An/CJT PVTfTT.VPVAQVTTVnn WT? "NT ATfAT?T-f- TV4- MTf 
riro+KiJ IS. 1"JV+ JjT rKKiil voJjui HjiilH VrtiDVr lub rid IS ±\IS-t\F 1 t iVr l v li\. 




Sbj ct : 


481 


AFGKRLEKTGNVM3LFKNRRATVSLGYIGLYEVASVFYGGQWEGNPDAKAFTLSIVKAMK 


540 


Query: 


541 


QLCADWSDEYDYHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRKNP 


600 






r\ r\ ■nTnTGn'CV VUt?CWOTDGT?CT TTYD T?/"*DT riT 1 T?TrT7r , TT7T^nT TAHITI? VVT'TJQ l?WVTl\/l?Tf J.D 




Sbjct: 


541 


QACEDWSDEYGYHFSVYSTPSESLTDRFCRLDTEKFGIVTDITDKEYYTNSFHYDVRKSP 


600 


Query: 


601 


TPFEKliDFEKIYPETGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTPIDKC 


660 










Sbj ct : 


601 


TPFEKLDFEKDYPEAGASGGFIHYCEYPVLQQNPKALEAVWDYAYDRVGYLGTNTPIDKC 


660 


Query: 


661 


YQCQFEGDFTPTDRGFTCPNCGNSDPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVK 


720 






Y CQFEGDFTPT+RGFTCPNCGN+DPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVK 




Sbjct: 


661 


YNCQFEGDFTPTERGFTCPNCGNNDPKTVDWKRTCGYLGNPQARPMVNGRHKEISARVK 


720 


Query: 


721 


HMNGSSIKNQG 731 








HMNGS+IK G 




Sbj Ct : 


721 


HMNGSTI KYPG 731 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1094 

A DNA sequence (GBSxll69) was identified in S.agalactiae <SEQ ID 3381> which encodes the amino 
acid sequence <SEQ ID 3382>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5372 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3383> which encodes the amino acid 
sequence <SEQ ID 3384>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 6084 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 28/47 (59%) , Positives = 40/47 (84%) , Gaps = 1/47 (2%) 
Query: 1 MGKYQLDYKGQAQVQKFHEKHSTGENANQKSRLKDLRKQFLEKAKKK 47 
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MGKYQLDYKG QV++FHEKHS + ++KSR+++L+ +FLEK+KK+ 
Sbjct: 1 MGKYQLDYKGMQQVERFHEKHSK-KKTDKKSRVQELKftRFLEKSKKQ 46 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
5 vaccines or diagnostics. 

Example 1095 

A DNA sequence (GBSxll70) was identified in S.agalactiae <SEQ ID 3385> which encodes the amino 
acid sequence <SEQ ID 3386>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
10 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0436 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB95794 GB:AL359949 putative oxidoreductase [Streptomyces 
coelicolor A3 (2) ] 

20 Identities = 91/299 (30%) , Positives = 147/299 (48%) , Gaps = 7/299 (2%) 

Query: 2 LQLGIVGLGGISQKAYLPYMRQVTGVHWHLFTRQKQILEEV- -NMLFGSSTAYDSLDSLA 59 

+++G +GLG I+QK YLP + +G+HLTR LV + + + LD+L 
Sbjct: 1 MKVGCIGLGDIAQKGYLPVLAALPGIELHLQTRTPATLTRVADK1RIPPAQRHADLDALL 60 

25 

Query: 60 EHPLDGVFIHVATSAHFDIAKLFLKKGIPVFMDKPLTEDYTSTKALYDLAKDHKTFLMAG 119 

LD F+H T+AH +1 L+ G+P ++DKPL + ++ L LA++ T L G 
Sbjct: 61 AQGLDAAFVmPTAAHPEIWRLLEAGVPTYVDKPLAYELADSERLVTIAEERGTSIAVG 120 

30 Query: 120 FZ^RFAPRIMEMKroffiDKNHIRTFKNAVN^ADFQYKLFDMFIHPLDTALFLTNNWKEG 179 

FNRR AP + + +1 KN PD++D FIH +DT FL V 
Sbjct: 121 FNRRHAPGYAQCAE-HPRELILMQKNRTGLPEDPRTMILDDFIHVVDTLRFLVPGPVDDV 179 

Query: 180 YFVTKRDGNKILQVSVTLETDSEIIEASMNLQSGSRREIIEIESPEVTYSLDDLSNLSVI 239 ' 
35 ++G+V+LD MN SGS EI+E+ + + +L+ VI 

Sbjct: 180 TVRARTEGGLLHHVVLQIAGDGFTALGViyiNRLSGSAEEILEVSGQDTKRQVvNLA--EVI 237 

Query: 240 DGFDRRAI-GFGSWASTLEKRGFEPMIDAFIQAITTGVNPISPKSSLLSHFICDQINKA 297 
D + + G W +RG E + AF+ A+ +G +S + +L +H +C+++ +A 

40 Sbjct: 238 DHKGQPTVRRRGDWVPVARQRGIEQAVLAFLDAVRSG-EVLSARDALATHELCERVVRA 295 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3387> which encodes the amino acid 
sequence <SEQ ID 3388>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
45 >» Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAF96942 GB:AE004430 oxidoreductase, Gfo/ldh/MocA family [Vibrio cholerae] 
Identities = 103/304 (33%) , Positives = 158/304 (51%) , Gaps = 11/304 (3%) 

55 

Query: 4 LNIGIVGLGAISQKAYLPYMRQLSDITWHLSTRNAAVRQQVGQLFGHAILYSDVKELSKT 63 

+ I ++GLG I+QKAYLP + Q DI L TRN V + + + +D +++ + 
Sbjct: 1 MKIAMIGLGDIAQKAYLPVLAQWPDIELVLCTRNPKVLGTIATRYRVSATCTDYRDVLQY 60 
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Query : 


64 


NLDGVFIHAATSAHAELASLFLNQGIPVFMDKPIADNYLMTKNLYDLAKENQTFLMAGFN 


123 






+D V IHAAT H+ LA+ FL+ GIP F+DKP+A + +NLY+LA+++ L GFN 




Sb j ct : 


61 


GVDAVMIHAATDVHSTIiAAFFLHLGIPTFVDKPLAASAQECENLYELAEKHHQPLYVGFN 


120 


Query: 


124 


RRFTPRVKK-LSSLSTK RKVAVEKNDIiNRPGDMTFKLFDFFIHPLDTALFLTEGT 


177 






RR P + LS L+ + R + EK+ PGD+ +FD FIHPLD+ + 




Sb j ct : 


121 


RRHIPLYNQHLSEIiAQQECGALRSLRWEKHRHALPGDIRTFVFDDFIHPLDSVNLSRQCN 


180 


Query: 


178 


LLKGHFQYHLEAGLLSQVMVTLMTESMTTTASMNIjQSGSRREVMEVQRAEETYHLENLDE 


237 






L H YH+ GLL+++ V T ASMN O G E + Y ++ + 




Sb j ct : 


181 


LDDLHLTYHMSEGLLARLDVQWQTGDTLLHASMNRQFGITTEHVTASYDNVAYLFDSFTQ 


240 


Query: 


238 


LSIYKGTEKRVLGFASWDTTLHKRGFETMIDAFLEAISTGVNPVS-PESSLLSHW 1 


292 






+++ ++ + W L +GF+ M+ +L+ + G P E +L SH I 




Sb j ct : 


241 


GKMWRDNQESRVALKDWTPMIASKGFDAMVQDWLQVARAGKLPTHIIERNLASHQLAEAI 


300 


Query: 


293 


CQQI 296 








CQQI 




Sbjct: 


301 


CQQI 304 





An alignment of the GAS and GBS proteins is shown below. 

Identities = 168/308 (54%) , Positives = 223/308 (71%) 



Query: 


1 


MLQLGIVGLGGISQKAYLPYMRQVTGVHWHLFTRQKQILEEvNMLFGSSTAYDSLDSIAE 


60 






ML +GIVGLG ISQKAYLPYMRQ++ + WHL TR + ++V LFG + Y + L++ 




Sb j ct : 


3 


MLNIGIVGLGAISQKAYLPYMRQLSDITWHLSTRNAA.VRQQVGQLFGHAILYSDVKELSK 


62 


Query: 


61 


HPLDGVFIHVATSAHFDIAKLFLKKGIPVFMDKPLTEDYTSTKALYDLAKDHKTFLMAGF 


120 






LDGVFIH ATSAH ++A LFL +GIPVFMDKP+ ++Y TK LYDLAK+ + +TFLMAGF 




Sb j ct : 


63 


TNLDGVFIHAATSAHAELASLFLNQGIPWMDKPIAD^ 


122 


Query: 


121 


NRRFAPRIMEMKKVEDKNHIRTFKNAVl^PM)FQYlCLFDMFIHPLDTMiFLTNNVVKRGY 


180 






NRRF PR+ ++ + K + KN +N P D +KLFD FIHPLDTALFLT + +G+ 




Sbjct: 


123 


NRRFTPRVKKLSSLSTKRKVAVEKiroi^PGDMTFKIjFDFFIHPLDTALFLTEGTLLKGH 


182 


Query: 


181 


FVTKRDGNKILQVSVTLETDSEIIEASMNLQSGSRREIIEIESPEVTYSLDDLSNLSVID 


240 






F + + QV VTL T+S ASMNLQSGSRRE++E++ E TY L++L LS+ 




Sb j ct : 


183 


FQYHLFAGLLSQVMVTLMTESMTTTASMNLQSGSRREVMEVQRAEETYHLENLDELSIYK 


242 


Query: 


241 


GFDRRAIGFGSWASTLEKRGFEPMIDAFIQAITTGVNPISPKSSLLSHFICDQINKANAP 


300 






G ++R +GF SW +TL KRGFE MIDAF++AI+TGVNP+SP+SSLLSH+IC QI + 




Sb j ct : 


243 


GTEKRVLGFASWDTTLHKRGFETMIDAFLEAISTGVNPVSPESSLLSHWICQQIADSQLS 


302 


Query: 


301 


FGMLNLKI 308 








+G L +++ 




Sb j ct : 


303 


YGELTVEL 310 





SEQ ID 3386 (GBS309) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 68 (lane 10; MW 63kDa). 

GBS309-GST was purified as shown in Figure 212, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 1096 

A DNA sequence (GBSxll71) was identified in S.agalactiae <SEQ ID 3389> which encodes the amino 
acid sequence <SEQ ID 3390>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N- terminal signal sequence 



Final Results 
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bacterial cytoplasm Certainty=0. 2983 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04222 GB:AP001508 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 52/129 (40%) , Positives = 70/129 (53%) , Gaps = 5/129 (3%) 

10 Query: 39 FEDWLDHNLNMELGVGVPDNFVPYIQFVSFDNDNNAIGFLNLRLRLNDTLLEKGGHIGYS 98 

FE L + + GV +P N V + IG +N+R IjND h +GGHIGY 

Sbjct: 43 FEHLLKTLKDYQHGVNLPANRVANTTYV^VHEQKRLIGAINIRHTLiroWLHHRGGHIGYG 102 

Query: 99 IRPRQRGKGYAKEQLKLGIEQAHLKNINEILVTCHVDNDASKSVILANGGVLEDCLHQ-- 156 
15 IRP +RGKGYA LKLG+E+A + ++L+TC +N S I NGGVL+ + 

Sbjct: 103 IRPSERGKGYATLMLKLGLEKAAALGLEKVLITCDKENLPSARTIQRNGGVLDSEVVDER 162 

Query: 157 TERYWI 162 

+RYWI 

20 Sbjct: 163 GIAIQRYWI 171 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3391> which encodes the amino acid 

sequence <SEQ ID 3392>. Analysis of this protein sequence reveals the following: 

Possible site: 59 
25 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2195 (Affirmative) < suco 

bacterial membrane Certainty=0.0000(Not Clear) < suco 

30 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 90/164 (54%) , Positives = 115/164 (69%) , Gaps = 4/164 (2%) 

35 Query: 1 MKLRRPVliEDKEEIlAMYKEFQKESSSVDG--GFYEPTMHFEDVaDHNIJWIELGVGVPDN 58 

M++RRP L+DK+ +L+M EF ++ S+ DG F ++E WL+ +L E+G+ 

Sbjct: 1 MEIRRPTLKDKDAVLSMINEFLEQKSATDGLTOFNVNDFNYETWLEDSLRQEMGLS--SQ 58 

Query: 59 FVPYIQWSFDNDNNAIGFIJSILRLRLNDTLLEKGGHIGYSIRPRQRGKGYAKEQLKLGIE 118 
40 VP IQ+V+FD + AIGFLNLRLRtN+ LLEKGGHIGYS+RP QRGKGYAKE LK + 

Sbjct: 59 GVPAIQYVAFDERSQAIGFLNLRLRLNERLLEKGGHIGYSVRPSQRGKGYAKEMLKQAVS 118 

Query: 119 QAHLKNINEILVTCHVDNDASKSVILANGGVLEDCLHQTERYWI 162 
A KNI ILVTC N AS++VI+AN G+LED TERYWI 
45 Sbjct: 119 YAISKNITTILVTCDETNVASRAVIVANVGILEDSRGGTERYWI 162 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1097 

50 A DNA sequence (GBSxll72) was identified in S.agalactiae <SEQ ID 3393> which encodes the amino 
acid sequence <SEQ ID 3394>. This protein is predicted to be anaerobic ribonucleotide reductase activator 
protein (nrdG). Analysis of this protein sequence reveals the following: 



55 



Possible site: 59 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 4239 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD00216 GB:U73336 anaerobic ribonucleotide reductase activator 
5 protein [Lactococcus lactis subsp. cremoris] 

Identities = 152/198 (76%) , Positives = 176/198 (88%) 

Query: 8 NTPKPGEWKSEELSHGHIIDYKAFNFVDGEGVRNSLYVAGCMFHCKGCYNTATWSFRAGI 67 
N PKPGEW+ + +ELS +1 DYK FNFVDGEGVR SLYV+GCMFHC+GCYN ATWSFR G 
10 Sbjct: 2 NNPKPGEWRADELSQISTYIADYKPFNFVDGEGVRCSLYVSGCMFHCEGCYNQATWSFRYGR 61 

Query: 68 PYTKELEDQIMTDLEQPWCGLTLLGGEPFLNTGILLPLLQRIRRELPEKDIWSWTGYTW 127 

PYTKELED+IM DL +PYVQGLTLLGGEPFMIT L+PLL+RIRRELP+KDIWSWTGYTW 
Sbjct: 62 PYTKELEDKIMADLAEPWCGLTLLGGEPFLNTTFLIPIiLKRIRRELPDKDIWSWTGYTW 121 

15 

Query: 128 EEMMLETQDKLEMLSLIDILVDGRFDQSKRNLMLQFRGSSNQRIIDVQKSLKEGEWIWE 187 

EEMMLET DKLEML L+D+LVDGRF+ SK+NLMLQFRGSSNQRIIDV KS +G+WIWE 
Sbjct: 122 EEMMLETDDKLEMLDLLDVLVDGRFELS KKNLMLQFRGSSNQRI IDVPKSRSKGQWIWE 181 

20 Query: 188 GLNDGDNSYEQVKRDDLL 205 

LNDG+N++EQ+ ++ L+ 
Sbjct: 182 KLNDGENNFEQIHKEKLI 199 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3395> which encodes the amino acid 
25 sequence <SEQ ID 3396>. Analysis of this protein sequence reveals the following: 
Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm Certainty=0. 4111 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

35 Identities = 167/202 (82%), Positives = 186/202 (91%) 

Query: 4 EASWNTPKPGEWKSEELSHGHIIDYKAFNFVDGEGVRNSLYVAGCMFHCKGCYNTATWSF 63 

E WN PKP EW++EELS G IIDYKAFNFVDGEGVRNSLYV+GC+FHCKGCYN ATWSF 
Sbjct: 4 EKCWNNPKPKEWQAEELSQGRIIDYKAFNFVDGEGVRNSLYVSGCLFHCKGCYNAATWSF 63 

40 

Query: 64 RAGIPYTKELEDQIMTDLEQPYVQGLTLLGGEPFIOTGILLPLLQRIRRELPEKDIWSWT 123 

+AG+PYT+ELE+QIMTDL QPYVQGLTLLGGEPFLNTGIL+PL++RIRRELPEKDIWSWT 
Sbjct: 64 KAGMPYTQELEEQIMTDLAQPYVQGLTLLGGEPFLNTGILIPLIKRIRRELPEKDIWSWT 123 

45 Query: 124 GYTWEEMMLETQDKLEMLSLIDILVDGRFDQSKRNLMLQFRGSSNQRIIDVQKSLKEGEV 183 

GYTWEEMMLET DKLEMLSLIDILVDGRFD +K+NLMLQFRGSSNQRI IDVQKSL EV 
Sbjct: 124 GYTWEEMMLETPDKLEMLSLIDILVDGRFDITKKNLMLQFRGSSNQRI IDVQKSLAAKEV 183 

Query: 184 VIWEGLNDGDNSYEQVKRDDLL 205 
50 +IW+ LNDGD ++EQ+ R+DLL 

Sbjct: 184 I IWDKLNDGDQTFEQISREDLL 205 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1098 

A DNA sequence (GBSxll73) was identified in S.agalactiae <SEQ ID 3397> which encodes the amino 
acid sequence <SEQ ID 3398>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>>> Seems to have no N-terminal signal sequence 
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INTEGRAL Likelihood = -3.03 Transmembrane 102 - 118 ( 101 - 119) 

Final Results 

bacterial membrane Certainty=0. 2211 (Affirmative) < suco 

5 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD24446 GB:AF118389 unknown [Streptococcus suis] 
10 Identities = 97/240 (40%) , Positives = 151/240 (62%) , Gaps = 1/240 (0%) 

Query: 2 IKILIPTAKEMKV-CQNIAWPKLSAQTKIIIDYFSTLTVSDLEDIYRINTSAARCEAQRW 60 

+KI+IP AKE+ +N ++ LS ++K ++D S V + Y++N + A EA RW 
Sbjct: 1 MKI 1 1 PNAKEVNTNLENASFYLLSDRSKPVLDAI SQFDVKKMAAFYKLNEAKAELEADRW 60 

15 

Query: 61 QDFKAKQLTLNPAIKLFNGLMYRNIKRHNLSTSEAQFMENSVFITSALYGIIPAMTLISP 120 

+ Q PA +L++GLMYR + R + + E +++V + +ALYG+I ISP 
Sbjct: 61 YRIRTGQAKTYPAWQLYDGLMYRYMDRRGIDSKEENYLRDHVRVATALYGLIHPFEFISP 120 

20 Query: 121 HRLDFNTKIKINNNSLKVFWRENYDTFMQSDDIMVSLLSNEFETVFSPKERQKLIHLNFI 180 

HRLDF +KI N SLK +WR YD + D++++SL S+EFE VFSP+ +++L+ + F+ 
Sbjct: 121 HRLDFQGSLKIGNQSLKQYWRPYYDQEVGDDELILSLASSEFEQVFSPQIQKRLVKILFM 180 

Query: 181 EDRDGQLKTHSTISKKARGKCLTAMMENNCQTLEHLKQLRFDGFCYDNELSDSKQLTFVK 240 
25 E++ GQLK HSTISKK RG+ L+ + +NN Q L ++ + DGF Y S + QLTF++ 

Sbjct: 181 EEKAGQLKVHSTISKKGRGRLLSWLAKNNIQELSDIQDFKVDGFEYCTSESTANQLTFIR 240 

A related GBS nucleic acid sequence <SEQ ID 10941> which encodes amino acid sequence <SEQ ID 
1 0942> was also identified. 

30 A related DNA sequence was identified in S.pyogenes <SEQ ID 3399> which encodes the amino acid 
sequence <SEQ ID 3400>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>» Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 3759 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 An alignment of the GAS and GBS proteins is shown below. 

Identities = 114/242 (47%) , Positives = 155/242 (63%) 

1 Query: 1 MIKILIPTAKEMKVCQNIAWPKLSAQTKI I IDYFSTLTVSDLEDIYRINTSAARCEAQRW 60 
M+ LIPTAKEM + + L ++ 1+ + +T DL YRI +A+ E QRW 

45 Sbjct: 1 MLTFLI PTAKEMTI PKESHPHLLPQDSQAILKIMAAMTTEDLAKSYRI KEESAKKEQQRW 60 

Query: 61 QDFKAKQLTLNPAIKLFNGLMYRNIKRHNLSTSEAQFMENSVFITSALYGIIPAMTLISP 120 

QD ++Q PA +LFNGLMYR+ 1 KR L+T E ++ V+ITS+ YGIIPA 1+ 
Sbjct: 61 QDMASQQSIAYPAYQLFNGLMYRHIKRDKLTTQEQAYLTQQVYITSSFYGIIPANHPIAE 120 

50 

Query: 121 HRLDFNTKIKINNNSLKVFWRENYDTFMQSDDIMVSLLSNEFETVFSPKERQKLIHLNFI 180 

HR DF+T+IKI SLK +WR Y+ F + ++SLLS+EF+ VFS +Q I F+ 
Sbjct: 121 HRHDFHTRIKIEGQSLKSYWRPCYNQFAKEHPQVISLLSSEFDDVFSKDCKQLWISPKFM 180 

55 Query: 181 EDRDGQLKTHSTISKKARGKCLTAI#1F^CQTLEHLKQLRFDGFCYDNELSDSKQLTFVKKQ 242 

+++GQ KTHSTISKKARG LTA MENNCQT++ LK L F GF Y +LS + ++KK+ 
Sbjct: 181 AEKEGQFKTHSTISKKARGAFLTACMElWCCTvDSLKSLVFAGFYYHPDLSTDHEFVYIKKK 242 
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SEQ ID 3398 (GBS428) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 80 (lane 6; MW 30.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 173 (lane 4; MW 55kDa). 

GBS428-GST was purified as shown in Figure 220, lane 6-7. 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1099 

A DNA sequence (GBSxll74) was identified in S.agalactiae <SEQ ID 3401> which encodes the amino 
acid sequence <SEQ ID 3402>. Analysis of this protein sequence reveals the following: 

10 Possible site: 23 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.59 Transmembrane 3 - 19 ( 3-19) 

Final Results 

15 bacterial membrane Certainty=0 . 1235 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10251> which encodes amino acid sequence <SEQ ID 
20 1 0252> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB07024 GB:AP001518 unknown conserved protein [Bacillus halodurans] 
Identities = 86/275 (31%) , Positives = 143/275 (51%) , Gaps = 6/275 (2%) 

25 Query: 17 MSYPYKANHSIESITLKVNDLENLVNFYSDIIGLTVIDKSSTRALLGVNQKIPLIILEKT 76 

M + + N ++ + +KV+DL + FY +IIG V+++S A L N + PL+++E+ 
Sbjct: 1 MEFHRQPNTFVDLVNIKVSDLSRALTFYQEIIGFQVLERSERSATLTANGRTPLLVIEQP 60 

Query: 77 E---LEKHSTYGLYHTAILVPDEYHLSLALNHLLSQHIPLEGGADHGYSNAIYLSDPEGN 133 
30 + ++ T GLYH A+L+P L LNHLL PL+G +DH S AIY +DP+GN 

Sbjct: 61 DPVIAKQPRTTGLYHFALLLPSRADLGRFLNHLLQSGYPLQGASDHLVSEAIYFADPDGN 120 

Query: 134 GIEIYNDKDISMWDIRESGQIIGITERLDIDNLLDSLVNVPNNYKLSEKTSIGHIHLSVK 193 
G+E+Y D+ S WD +G++ TE + +NLL + P L +T +GHIHL V 
35 Sbjct: 121 GVEVYADRPSSSWD-WSNGEVKMSTEPIHAENLLAEGKDEPWT-ALPPETILGHIHLHVA 178 

Query: 194 DAKISSKLYQNVFGLDEKFAIPT-ASWIASGNYHHHLAFNNWAGPNLSKNQEDRPGISLL 252 

+ + Y G + + A +1++GNYHHH+ N W G E G+ 

Sbjct: 179 mFEAETFYIEGLGFNWARLGNQALFISTGNYHHHIGIjNTWNGVGAPTPPEHSVGLKWF 238 



40 



Query: 253 TIAYNDDNLFRDSLKKAQLYQLTFLEKQDHYYIIE 287 

++ Y + + ++ + + K ++I+ 

Sbjct: 239 SLTYPSEEVRAKTVNRLETIGFQVERKHGEEWVID 273 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 3403> which encodes the amino acid 
sequence <SEQ ID 3404>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

>>> Seems to have no N- terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0 . 0936 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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Aa alignment of the GAS and GBS proteins is shown below. 

Identities = 143/282 (50%) , Positives = 194/282 (68%) 

Query: 17 MSypyKMmSIESITLKOTJDLENLVNFYSDIIGLTVIDKSSTRALLGVNQKIPLIILEKT 76 
5 M YPY + S+ +++L VDL + FY+ IIGL V+ + +T L + K ++ L +T 

Sbjct: 1 MIYPYNSTISLGWSLNVTDLAKMTTFYTSIIGLQVIiSQDTTSRQLTTDGKTVILELRQT 60 

Query: 77 ELEKHSTYGLYHTAILVPDEYHLSLALNHLLSQHIPLEGGADHGYSNAIYLSDPEGNGIE 136 
L YGLYHTA LVPD + L h LNH L++ I LEG ADHG+S AIYLSDPEGNGIE 

10 Sbjct: 61 PLPGDKAYGLYHTAFLVPDRHSLGLVLNHFLTRSISLEGAADHGHSEAIYLSDPEGNGIE 120 

Query: 137 IYNDKDISMVTOIRESGQIIGITERLDIDMLiLDSLVlWPNl^KLSEKTSIGHIHLSVKDAK 196 

IY+DK + WDIR++GQI IG+TE D ++L+ L ++P ++ L++ T I H+HLSVK+A 
Sbjct: 121 IYHDKAVEHWDIRDNGQIIGVTEPTDTKSILEQLTDIPKHFLLAQDTRIRHVHLSVKNAL 180 

15 

Query: 197 ISSKLYQNVFGLDEKFAIPTASWIASGNYHHHLAFNNWAGPNLSKNQEDRPGISLLTIAY 256 

SS LYQ VF L +K IP+ASWIASGNY+HHLAFN+W+ P L K+QE PG++ LTI 
Sbjct: 181 ASSLLYQKVFDLGDKMTIPSASWIASGNYYHHLAFNHWSAPYLKKHQEGAPGLAFLTIHI 240 

20 Query: 257 NDDNLFRDSLKKAQLYQLTFLEKQDHYYIIEDFDGIRIKWL 298 

LF +LKKA+L+ L L++ + ED +GIR+ V+L 
Sbjct: 241 ETPLLFSATLKKARLHGLAILQEDSSSFTTEDEEGIRVNVIL 282 

SEQ ID 3402 (GBS429) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 80 (lane 7; MW 34.2kDa). 

GBS429-His was purified as shown in Figure 214, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1100 

30 A DNA sequence (GBSxll75) was identified in S.agalactiae <SEQ ID 3405> which encodes the amino 
acid sequence <SEQ ID 3406>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

>>> Seems to have no N-terminal signal sequence 

35 Final Results 

bacterial cytoplasm Certainty=0 . 2362 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 A related GBS nucleic acid sequence <SEQ ID 10249> which encodes amino acid sequence <SEQ ID 
10250> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC21682 GB:U32686 conserved hypothetical protein [Haemophilus influenzae Rd] 
Identities = 89/261 (34%) , Positives = 151/261 (57%) , Gaps = 4/261 (1%) 

Query: 10 MVRLIFSDIDGTLINSNFKVTPKTRQGIKQIVAQGATFVPISARMPEAITPIMEQIGIDS 69 

M + +FSD +GTL+ S ++P+T IK++ A G FVPISAR PIP +Q+ ++ 
Sbjct: 2 ^KAVFSDFNGTLLTSQHTISPRTVWIKRLTANGIPFVPISARSPLGILPYWKQLETNN 61 

50 Query: 70 YIISYNGALIQDMQQKTIASHTMDGQVALQVCSYVSKHYSKIAWNVYRYHEWYSCDKENE 129 

+++++GALI + + I S ++ + L++ + +++H + NY ++ ++ D EN+ 
Sbjct: 62 VLVAFSGALII^QNLEPIYSVQIEPKDILEINTVIiAEH-PL 120 

Query: 130 WVQKEEEIVGLQSKEMSLMELEKQDRIHKLLLMGEPSLMGELENTLKAQYPHLSIAQSAP 189 
55 WV E + ++ + HK+ ++GE + E+E LK ++PHLSI +S 

Sbjct: 121 WVIYERSVTKIEIHPFDEVATRSP HKIQIIGEAEEIIEIEVLLKEKFPHLSICRSHA 177 



45 
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Query: 190 YFIEIMAPGIEKGKSAKTLADYLDISLADSIAFGDNYNDIiNLLEIVGKGFVMGNAPKDLQ 249 

F+E+M KG + + L DY + + IAFGDN+NDL++LE VG G MGNAP +++ 

Sbjct: 178 NFLEVMHKSATKGSAVRPLEDYFGVQTNEVIAFGDNFMDLDMLEHVGLGVAMGNAPNEIK 237 

Query: 250 ERIGNVTQDNDNDGIYYALVE 270 

+ VT N+ DG+ L E 
Sbjct: 238 QAANWTATNNEDGLAL1LEE 258 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1101 

A DNA sequence (GBSxll76) was identified in S.agalactiae <SEQ ID 3409> which encodes the amino 
acid sequence <SEQ ID 341 0>. Analysis of this protein sequence reveals the following: 

15 Possible site: 19 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG07223 GB:AE004801 hypothetical protein [Pseudomonas aeruginosa] 
25 Identities = 103/283 (36%) , Positives = 165/283 (57%) , Gaps = 1/283 (0%) 

Query: 33 KHIGILQYVEHPSLTATRKGFIKEIAKEGYKDGKNIKIEYKMAQGDQSNIQSISEKLIKD 92 

K + + VEHP+L A R G + L + GY+DGKN+K +Y++AQG+ 1+ K I D 

Sbjct: 31 KSVAVTAIVEHPALDAARDGVTCEALQEAGYEDGKNLKWQYQSAQGNTGTAAQIARKFIGD 90 

30 

Query: 93 NK-LVLGIATPAAQSLTTVSTETPILFTAVTDPVSAELVKSMKKPEGLATGTSDMSPIKK 151 

+++GIATP+AQ+L + PI+F+ VTDPV A L S + TG SDM + K 

Sbjct: 91 KPDVIVGIATPSAQALVAATKS I PIVFSTVTDPVGAHLTPSWEASGTNVTGVSDMLALDK 150 

35 Query: 152 QVSLLRKVMPKVKRVGIMYTTSERNSEVQVKQAKKIFQEAGIKTSVKGISSTNDVQDTAK 211 

Q+ L++KV+P KR+G++Y E NS V VK+ K++ + G+ + DV A+ 

Sbjct: 151 QIELIKKVVPGAKRIGMVYNPGEANSVVWKEJjKELLPKMGLSLVEASAPRSVDVSSAAR 210 

Query: 212 SLMSKTEVIFVPTDNIIASSVTLLGNLSKELKVPWGGSADMVPSGLLFSYGADYEALGR 271 
40 SL+ K + 1+ TDN + S+ L + + K+P++ D V G + + G +Y+ +G+ 

Sbjct: 211 SLVGKVDAIYTNTDNNWSAYEALVKVGNDAKIPLIASDTDSVKRGAIAALGINYKEMGK 270 

Query: 272 QTARQAVKILKGKDVAKVPSEYPQNLKWVNEDMAKELGIDVS 314 
QT R V+ILKG+ ++ E NL++ VN A++ G+ +S 
45 Sbjct: 271 QTGRMWRILKGEKPGEIKPETSDNLQLFVNPGAAQKQGVTLS 313 

There is also homology to SEQ ID 2712. 

SEQ ID 3410 (GBS188) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 39 (lane 2; MW 36.6kDa). 

50 The GBS188-His fusion product was purified (Figure 204, lane 6) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 247), FACS, and in the in vivo passive protection 
assay (Table III). These tests confirm that the protein is immunoaccessible on GBS bacteria and that it is an 
effective protective immunogen. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1102 

A DNA sequence (GBSxll77) was identified in S.agalactiae <SEQ ID 341 1> which encodes the amino 
acid sequence <SEQ ID 3412>. This protein is predicted to be probable permease of ABC transporter 
(rbsC). Analysis of this protein sequence reveals the following: 

Possible site: 21 

>>> Seems to have a cleavable N-term signal seq. 
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Final Results 

bacterial membrane Certainty=0 . 7453 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAG07224 GB:AE004801 probable permease of ABC transporter 
[Pseudomonas aeruginosa] 
25 Identities = 114/285 (40%) , Positives = 175/285 (61%) , Gaps = 3/285 (1%) 

Query: 5 ILSGISQGLLWSIMAIGVFITFRILDIADLSAEGAFPMGAAVCALCIVNDIWPIVATIAG 64 

+ + GL++S++A+GVFI+FR+L DL+ +G+FP+G AVCA I +P AT+A 
Sbjct: 6 LFGALEIGLIFSLVALGVFISFRLLRFPDLTVDGSFPLGGAVCATLIALGWDPYSATLAA 65 

30 

Query: 65 MLGGMLAGLVSGFLHTKMKIPALLTGIITLTGLYSINLLVLGRSNVSFALKNTLVTMVTR 124 

G LAGL +G L+ K+KI LL 1+ + LYSINL ++G+ NV + TL T++ 
Sbjct: 66 TAAGALAGLATGLLNVKLKIMDLLASILMMIALYSINLRIMGKPNVPLIAEPTLFTLLQP 125 

35 Query: 125 LGMKLSAVLLIGIVCVGLVILILYLFLNTQLGLALRATGDNEAMGQANSIKVDRMKMLG 184 

L+ L+ + V L+L F TQ GLA+RATG N M +A + M +LG 

Sbjct: 126 EWLSDYVFRPLLLVFIVIAAKLLLDWFFTTQKGLAIRATGSNPRMARAQGVNTGGMILLG 185 

Query: 185 YMIGNGLIALSGALIiAQNNGYADIJSIMGVGTIVIGLASIILAEvMIKyLPLGKRLWSIVLG 244 
40 IN L+AL+GAL AQ G AD++MG+GTIVIGLA++I+ E ++ L +++LG 

Sbjct: 186 MAISNALVALAGALFAQTQGGADISMGIGTIVIGLAAVIVGESILPSRRLILATLAVILG 245 

Query: 245 SVLYRMI IVFILTTD IDAQMIKLVSAILLALILYVPELRAKL 286 

+++YR I L +D + AQ + LV+A+L+ + L +P ++ +L 
45 Sbjct: 246 AI VYRFFIALALNSDFIGLQAQDLNL VTAVLVTVALVI PMMKKRL 290 

There is also homology to SEQ ID 2716. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

50 Example 1103 

A DNA sequence (GBSxll78) was identified in S.agalactiae <SEQ ID 3413> which encodes the amino 

acid sequence <SEQ ID 3414>. This protein is predicted to be ABC transporter. Analysis of this protein 

sequence reveals the following: 

Possible site: 41 
55 >>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 . 3 798 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF86640 GB:AF162694 ABC transporter [Enterococcus gallinarum] 
Identities = 171/264 (64%), Positives = 213/264 (79%), Gaps = 1/264 (0%) 

LLELVNLHKTFEKGTVNENHVLRGLDLTIEDGDFISVIGGNGAGKSTLLNCIAGLIPIDQ 62 
+L + +LH+TFEKGT+NENHVLRG+DLT+ GDFI ++ IGGNGAGKSTLLN IAG IP +Q 
VLTISDLHQTFEKGTINENHVLRGIDLTMNSGDFITI IGGNGAGKSTLLNS IAGTI PTEQ 64 

15 Query: 63 GAITLDNQSITKDSVEKRSKDISRVFQDPRMGTATNLTIEENMAIAHKRGNKRHIFRQSV 122 

G I L ++ IT+ SV +RSK+ISRVFQDPRMGTA LT+EEN+A+A+ KRG R F V 
GKIVLGDKEITRHSVTRRSKEISRVFQDPRMGTAWLTvEENLALAYKRGQVRG-FSSGV 123 

TDDDRQLFKKSLSQLGLGLENRMKTDAAFLSGGQRQALTLAMATLVRPKLLLLDEHTAAL 182 
20 R FK+ L++L LGLENR+ T+ LSGGQRQA+TL MATL +PKL+LLDEHTAAL 
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DPKTS VM LT ++I+EQ+LTA M+TH+ME AI YGNRL+ML+ GKIWD+ GE K++L 



TV +LM LFH+NSG +L DD L+L 

Sbict: 244 

30 

There is also homology to SEQ ID 2720: 

= 116/249 (46%), Positives = 166/249 (66%), Gaps = 1/249 (0%) 

LLELvNLHKTFEKGTVNENHVLRGLDLTIEDGDFISVIGGNGAGKSTLLNCIAGLIPIDQ 62 
35 ++EL+N + G + +L + LTI + DF++++GGNGAGKSTL N IAG + + + 

IIELINATVDVDNGFEDAKTILDNVTLTIYEHDFLTILGGNGAGKSTLFNVIAGTLSLTR 63 

GAITLDNQSITKDSVEKRSKDISRVFQDPRMGTATNLTIEENMAIAHKRGNKRHIFRQSV 122 
G I + Q +T EKR+ +SRVFQD +MGTA +T+ EN+ IA +RG KR + + + 
40 Sbjct: 64 GQIRILGQDVTHTOAEKRALYLSRVFQDSKMGTAPRMTVAENLLIARQRGGKRSIASRKI 123 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 

Example 1104 

A DNA sequence (GBSxll79) was identified in S.agalactiae <SEQ ID 3415> which encodes the amino 
acid sequence <SEQ ID 341 6>. This protein is predicted to be mannose-specific phosphotransferase system 
component IIAB. Analysis of this protein sequence reveals the following: 

60 Possible site: 54 
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»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3 52 7 (Affirmative) < suco 

5 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD46485 GB:AF130465 mannose- specif ic phosphotransferase system 
10 component IIAB [Streptococcus salivarius] 

Identities = 287/336 (85%) , Positives = 306/336 (90%) , Gaps = 6/336 (1%) 

Query: 1 MGIGI I IASHGKFAEGIHQSGSMI FGEQEKVQWTFMPNEGPDDLYGHFNNAIAQFDADD 60 
MGIGI I IASHGKFAEGIHQSGSMI FG+QEKVQWTFMP+EGPDDLY HFN+AIAQFDADD 
15 Sbjct: 1 MGIGIIIASHGKFAEGIHQSGSMIFGDQEKVQWTFMPSEGPDDLYAHFNDAIAQFDADD 60 

Query: 61 EVLVLADLWSGSPFNQASRVMGENPERKNmilTGLNLPMLIQAYTERMMDANAGVEQVAA 120 

E+LVLADLWSGSPFNQASR+ GENP+RK+AIITGLNLPMLIQAYTERMMDANA EQVAA 
Sbjct: 61 EILVLADLWSGSPFNQASRIAGENPDRKIAIITGLNLPMLIQAYTERMMDANATAEQVAA 120 

20 

Query: 121 NIIKESKEGIKALPEELNPVVEATPVAGVPADVPAEVKQSGSIPEGTVIGDGKLKINLAR 180 

NIIKE+K GIKALPEELNP E T A V A P G+IPEGTVIGDGKLKINLAR 
Sbjct: 121 NIIKEAKGGIKALPEELNPAEETT-AAPVEAAAP QGAIPEGTVIGDGKLKINLAR 174 

25 Query: 181 IDTRLLHGQVATAWTPASKANRIIVASDEVSKDELRKQLIKQAAPGGVKANWPISKLIE 240 

+DTRLLHGQVAT WTPASKA+RI IVASD+V+KDELRK+LIKQAAP GVKANWPI KLI + 
Sbjct: 175 LDTRLLHGQVATNWTPASKADRIIVASDDVAKDELRKELIKQAAPNGVKANVVPIQKLID 234 

Query: 241 VAKDPRFGNTPJUjILFEWQDALRAIEGGVEIPELIWGSMAHSTGKTMvNISrVLSMDKDDV 300 
30 +KDPRFGNT AL1LFETVQDALRAIEGGV I ELNVGSMAHSTGKTMVNNVLSMDKDDV 

Sbjct: 235 ASKDPRFGNTHALILFEWQDRLRAIEGGVPIKELOTGSMRHSTGKTMVNNVLSMDKDDV 294 

Query: 301 AAFEKLRDLGVSFDVRKVPNDAKKNLFDLINKANVK 336 
A FEKLRDLGV FDVRKVPND+KK+LFDLI KANV+ 
35 Sbjct: 295 ACFEKLRDLGVEFDVRKVPNDSKKDLFDLIKKANVQ 330 

A related DNA sequence was identified in S.pyogenes <SEQ ID 341 7> which encodes the amino acid 

sequence <SEQ ID 341 8>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
40 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3533 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/336 (85%) , Positives = 308/336 (90%) , Gaps = 6/336 (1%) 

50 Query: 1 MGIGI IIASHGKFAEGIHQSGSMIFGEQEKVQWTFMPNEGPDDLYGHFNNAIAQFDADD 60 

MGIGIIIASHGKFAEGIHQSGSMIFGEQEKVQWTFMPNEGPDDLYGHFNNAI QFDADD 
Sbjct: 1 MGIGI I IASHGKFAEGIHQSGSMI FGEQEKVQWTFMPNEGPDDLYGHFNNAIQQFDADD 60 

Query: 61 EvLVLADLWSGSPFNQASRVMGENPERKMAIITGLNLPMLIQAYTERMMDANAGVEQVAA. 120 
55 E+LVLADLWSGSPFNQASRV GENP+RKMAIITGIiNLPMLIQAYTER+MDA AGVEQVAA 

Sbjct: 61 EILVLADLWSGSPFNQASRVAGENPDRKMAIITGLNLPMLIQAYTERLMDAGAGVEQVAA 120 

Query: 121 NIIKESKEGIKALPEEmPVVEATPVAGVPADVPAEVKQSGSIPEGTVIGDGKLKINLAR 180 
NIIKESK+GIKALPE+LNPV E V + G+IP GTVIGDGKLKINLAR 

60 Sbjct: 121 NI I KESKDGI KALPEDLNP VEETAATEKWNAL QGAIPAGTVIGDGKLKINLAR 174 

Query: 181 IDTRLLHGQVATAWTPASKANRIIVASDEVSKDELRKQLIKQAAPGGvKANVVPISKLIE 240 

+DTRLLHGQVATAWTPASKA+RIIVASDEV++D+LRKQLIKQAAPGGVKANVVPISKLIE 
Sbjct: 175 VDTRLLHGQVATAWTPASKADRIIVASDEVAQDDLRKQLIKQAAPGGVKANVVPISKrilE 234 
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Query: 241 VAKDPRFGNTRALILFEWQDALRAIEGGVEIPELNVGSMAHSTGKTWVMISrTOSMDKDDV 300 

+KDPRFGNT ALILF+T QDALRA+EGGVEI ELNVGSMAHSTGKTMVNNVLSMDK+DV 
Sbjct: 235 ASKDPRFGISPrHALILFQTPQDALRAVEGGVEINEI^GSMMSTGKTMVNNVLSMDKEDV 294 

5 

Query: 301 AAFEKLRDLGVSFDVRKVPNDAKKNLFDLINKANVK 336 

A FEKLRDLGV+FDWKVPND+KKNL.F+LI K N+K 
Sbjct: 295 ATFEKLRDLGVTFDVRKVPNDSKKNLFELIQKTNIK 330 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1105 

A DNA sequence (GBSxll80) was identified in S.agalactiae <SEQ ID 3419> which encodes the amino 
acid sequence <SEQ ID 3420>. Analysis of this protein sequence reveals the following: 

15 Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3873 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB06625 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
25 Identities = 89/267 (33%) , Positives = 139/267 (51%) , Gaps = 3/267 (1%) 

Query: 3 KKI IAVnLDGTLLHNNNTISDYTftDTLRKVQAQGHKVI ITTGRPYRMALAHYLRLDLKTP 62 

+ +IA+DLDGTLL +N TIS T T++K + GH V+I+TGRPYR ++ +Y L L T 
Sbjct: 4 RHLIALDLDGTLLTDNKTISMKTKQTIQKftREAGHIWISTGRPYRASIQYYQELQLDTA 63 

30 

Query: 63 MINFNGALTHIPEKKWAFERSATIDKKLLLETTjNLSDAIQADFIASEYRKNFYITMDNRD 122 

++NFNGA H P+ ++ + + +A IE ++Y+ D 

Sbjct: 64 IVNFNGAFVHHPKDSSFGTYHHPLELSTARQVIETCEAFDVSNIMVEVIDDYYLRY--YD 121 

35 • Query: 123 KINPQLFGVNEITDKMALDOTKITRNPNALLMQTRHKDKYEIAKELRQHFNHELEVDSWG 182 

++ Q F + + + K+ +P +L+ + EL L ++ +WG 

Sbjct: 122 EDFIQTFTEGQGPVEHGNLLKKLRDDPTCVLIHPKDDHVSELRSLLDGAHAEVIDQRTWG 181 

Query: 183 GPLNILEFSPKGVNKAYALKHLLKSLNLSQENLIAFGDEHNDTEMLAFAHTGYAMKNANP 242 
40 P N++E G+NKA LK + + +E +IAFGDE ND EM+ +A G AM NA 

Sbjct: 182 APWNVIEIVKAGMNKA.VGLKRIADYYQVPKERIIAFGDEDNDFEMIEYAGKGVAMANAID 241 

Query: 243 TLLPYADQQIQWTNEEDGVAKTLEKLL 269 
L A+ I +NE+DG+A LE+ L 
45 Sbjct: 242 PLKALAN-DITLSNEDDGIAVYLEEAL 267 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3421> which encodes the amino acid 
sequence <SEQ ID 3422>. Analysis of this protein sequence reveals the following: 

Possible site: 50 
50 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 43 80 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

55 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 188/270 (69%) , Positives = 224/270 (82%) 
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Query: 1 MTKKIIAVDLDGTLLHNl^ISDYrjffiTLRKVQAQGHKVIITTGRPYRMALMJYLRLDLK 60 

MTKK+IA+DLDGTLLH++NTIS YT T++ VQ +GH VII+TGRPYRMAL +YL+L+LK 
Sbjct: 1 MTKKLI AIDLDGTLLHHDNTI STYTQKTIKAVQDKGHHVI I STGRPYRMALGYYLQLNLK. 60 

5 Query: 61 TPMINFNGALTHIPEKKWAFERSATIDKKLLbETtNLSDAIQADFIASEYRKNFYITMDN 120 

TP+I FNGALTH+PE+KWA+E + T+DK LL L D Q DFIASEYRKN YITM N 
Sbjct: 61 TPIITFNGALTHMPEQKWAYEHNVTLDKGYLLRLLKYQDDFQMDFIASEYRKNVYITMTN 120 

Query: 121 RDKINPQLFGVlffilTDKMALDVTKITRNPNALLMQTRHKDKYELAKELRQHFNHELEVDS 180 
10 + I+PQLFGV+EIT MAL++TKITRNPNALLMQT H+DKY LAK +R F E+E+DS 

Sbjct: 121 PESIDPQLFGVDEITQDMALEITKITRNPNALLMQTHHEDKYALAKNMRACFKDEIEIDS 180 

Query: 181 WGGPLNILEFSPKGVNKAYALKHLLKSIiNLSQE^IAFGDEHlTOTEMIiAFAHTGYAMKNA 240 
WGGPLNILE S K VNKAYAL +LL N+ +++LIAFGDEHNDTEMLAFA TGYAMKNA 
15 Sbjct: 181 WGGPIiNILEISSKNVNKAYALNYLLGIY]m)KKDLIAFGDEHM)TEM]^FAGTGYAMKNA 240 

Query: 241 NPTLLPYADQQIQWTNEEDGVAKTLEKLLL 270 

+P LLPYADQQ+ ++NEEDGVAK LE+L L 
Sbjct: 241 SPVLLPYADQQLNFSNEEDGVAKKLEELFL 270 

20 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1106 

A DNA sequence (GBSxll81) was identified in S.agalactiae <SEQ ID 3423> which encodes the amino 
25 acid sequence <SEQ ID 3424>. Analysis of this protein sequence reveals the following: 
Possible site: 39 

»> Seems to have an uncleavable N-term signal seq 

30 
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35 Final Results 

bacterial membrane — Certainty=0 . 3951 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1107 

45 A DNA sequence (GBSxll82) was identified in S.agalactiae <SEQ ID 3425> which encodes the amino 
acid sequence <SEQ ID 3426>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 2 02 5 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1108 

A DNA sequence (GBSxll83) was identified in S.agalactiae <SEQ ID 3427> which encodes the amino 
acid sequence <SEQ ID 3428>. This protein is predicted to be an integral membrane protein. Analysis of 
this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -5.41 Transmembrane 180 - 196 ( 179 - 199) 

INTEGRAL Likelihood = -5.31 Transmembrane 96 - 112 ( 94 - 114) 

INTEGRAL Likelihood = -2.18 Transmembrane 129 - 145 ( 129 - 145) 

INTEGRAL Likelihood = -1.33 Transmembrane 37 - 53 ( 37 - 53) 



Final Results 

bacterial membrane Certainty=0. 3166 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8729> which encodes amino acid sequence <SEQ ID 8730> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 

McG: Discrim Score: 5.85 

GvH: Signal Score (-7.5): -2.39 
Possible site: 18 

>>> Seems to have a cleavable N-term signal seq. 

ALOM program count: 4 value: -5.41 threshold: 0.0 

INTEGRAL Likelihood = -5.41 Transmembrane 176 - 192 ( 175 - 195) 
INTEGRAL Likelihood = -5.31 Transmembrane 92 - 108 ( 90 - 110) 
INTEGRAL Likelihood = -2.18 Transmembrane 129 - 145 ( 129 - 145) 
PERIPHERAL Likelihood =0.05 57 
modified ALOM score: 1.58 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 3166 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC65028 GB:AE001188 conserved hypothetical integral membrane 
protein [Treponema pallidum] 
Identities = 54/190 (28%), Positives = 93/190 (48%), Gaps = 14/190 (7%) 

Query: 14 LFFIVISFGIKYYHLQG- -PNLIWNMTLALIALDFAYLTSL- -FKKKILIGLFALAWFFF 69 

+F +++SFG + L+WN+ LA I ++ + F+ + LWF 

Sbjct: 3 VFCLLLSFGRRCVAADNFLSFLVWNLVIiAFIPWLISAILHVRRFAVRSVQLFLMLLWLLF 62 

Query: 70 YPNTFYMLTDIIHMHFVGDVLYNKTNLILYILYVSSILFGFLSGIESFSVIMRKFRISNI 129 

+PN Y+LTDIIH+ L +IL + + + F+S S++ R F I 

Sbjct: 63 FPNAPYILTDIIHLGKGKSFLLYYDLIILLAYSFTGLFYAFVSLHLIESILARDFHIKRP 122 

Query: 130 FLRWGIIGIVSL-VSSFGIHIGRYARLNSWDILTKPQWINELLAVPSR DSFHFI 183 

F II + L + +FGI++GR+ R NSWDI+ + +++++ R D++ F+ 

Sbjct: 123 F IISVFELYLCAFGIYLGRFLRWNSWDIVLHGRTILSDIGIRVIRPVFYVDTWMFV 178 



Query: 184 LGFTFLQVLC 193 
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F + VLC 



Sbjct: 179 FFFGTMLVLC 188 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1109 

A DNA sequence (GBSxll84) was identified in S.agalactiae <SEQ ID 3429> which encodes the amino 
acid sequence <SEQ ID 3430>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 171 - 187 ( 166 - 191) 

Final Results 

bacterial membrane Certainty=0 . 3718 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1110 

A DNA sequence (GBSxll85) was identified in S.agalactiae <SEQ ID 3431> which encodes the amino 
acid sequence <SEQ ID 3432>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

>>> Seems to have no N-terminal signal sequence 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF95422 GB:AE004299 conserved hypothetical protein [Vibrio cholerae] 
Identities = 193/471 (40%) , Positives = 286/471 (59%) , Gaps = 42/471 (8%) 

Query: 1 MEKFFKLKEHGTTIRTEITAGLTTFFAMSYILFVNPAILSQTGMPAQGVFLATIIGAWA 60 

+EK FKL E+GT +RTEI AG+TTF M+YI+FVNPAILS GM VF+AT + A + 
Sbjct: 2 LEKLFKLSEYGTNVRTEILAGWTFLTMAYIIFVNPAILSDAGMDRGAVFVATCLAAAIG 61 

Query: 61 TSVl^FYANLPYAQAPGMGIiNAFFTYTWFALGYTWQEALAMVFICGLISLIITLTKVRK 120 

+M F AN P AQAPGMGLNAFFTY W +G+TWQ ALA VF G++ ++++L K+R+ 
Sbjct: 62 CFIMGFIANYPIAQAPGMGLNAFFTYGWLGMGHTWQVALAAVFCSGVLFILLSLFKIRE 121 



Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



- Certainty=0. 5182 (Affirmative) < suco 

- Certainty=0.0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 
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Query: 121 MIIESIPTTLKSAITAGIGTFIAYVGIKNRGFLKFSIDPGTyDWGKGAAKGIiATITAMS 180 

II SIP +L++ I+AGIG FLA++ +KNAG + +P T +V GA h + 
Sbjct: 122 WIINSIPHSLRTGISAGIGLFLAFIALKNAGIV--VDNPAT--LVSLGAITSLHAV 173 

Query: 181 SATPGLVSFDNPAILLSLIGLSITIFFIVKGIRGGIILSILTTTLLGILMGWKLDAINW 240 

L+ +G +TI + +G++G ++++IL T LG++ G V+ I 
Sbjct: 174 LAAVGFFLTIGIiVYRGVKGAVMIAIIAVTALGIiVFGDVQWGGIMS 218 

Query: 241 EATNLSASFRDLKQVFGVALGEKGLISLFSNPSRLPSVLMAIIAFSLTDIFDTIGTLIGT 300 

+++ +F Q+ A+ E G+IS+ + AF D+FDT GTL+G 

Sbjct: 219 TPPSIAPTF MQLDFSAVFEIGMISV VFAFLFVDLFDTAGTLVGV 262 

Query: 301 GEKVGIIATTGDNHESKSLDKALYSDLIGTTFGAICGTSNVTTYVESAAGIGAGGRTGLT 360 

K G++ G + L++AL +D T+ GA+ GTSN T+Y+ES +G+ GGRTGLT 
Sbjct: 263 ATKAGLIEKDG KIPRLNRALLADSTATSVGALLGTSNTTSYIESVSGVAVGGRTGLT 319 

Query: 361 ALWAGLFAISSFFSPLVSIVPSQATAPILVIVGIMMLSNLKDIKWDDMSEAIPAFFTSL 420 

A+W LF ++ FFSPL ++P+ ATA L V I+M+S L I W D++EA P T L 
Sbjct: 320 AVWGILFLLiALFFSPIAGMIPAYATAGALFYTOILMMSGLVSIDWRDLTFJ^PTVVTCL 379 

Query: 421 FMGFTYSITYGIAAGFLTYTLAKVIKGQAKDIHWLWILDILFILNFISLA 471 

M T+SI GI+ GF+ Y K+ G+ + + + +W++ +F++ +1 A 
Sbjct: 380 MMPLTFSIAEGISLGFIAYAAIKLFSGKGRSVSLSVWVMAAIFVIKYILAA 430 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3433> which encodes the amino 
sequence <SEQ ID 3434>. Analysis of this protein sequence reveals the following: 

Possible site: 51 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane 

bacterial outside 

bacterial cytoplasm 



Certainty=0. 5 62 8 (Affirmative) < suco 
Certainty=0. 0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:BAB04327 GB:AP001509 unknown conserved protein [Bacillus halodurans] 
Identities = 192/485 (39%) , Positives = 276/485 (56%) , Gaps = 53/485 (10%) 

Query: 1 MEKFFKLSENGTTVSTEIMAGLTTFFAMSYILFvNPSILGAAGMPSNAVFLATIIAAA.IS 60 

M+++F E+GTT E +AGLTTF +M+YILFVNP ILG AGM AVF+AT +AAAI 
Sbjct: 1 MDRYFGFKEHGTTYGRESIAGLTTFLSMAYILFVNPLILGDAGMDVQAVFMATALAAAIG 60 

Query: 61 TLIMGLFANVPYALAPGMGIJ^AFFTYTvvFALRFSWQEAIiAWFICGLFNIFITVTKFRK 120 

TLIMG+ A P ALAPGMGLNAFF Y+W + WQ AL VF+ G+ I ITV K R+ 
Sbjct: 61 TLIMGILAKYPIALAPGMGLNAFFAYSWIGMGIDWQLALFGVFVSGIIFILITVFKIRE 120 

Query: 121 SIIKAIPVSLQHAIGGGIGVFVAYLGFKNANIITFSISAENIWIVNGVEPAKASAKTFAD 180 

II AIP L++A GIG+F+A++G KNA 1+ 
Sbjct: 121 VIINAIPAELKNAAAAGIGLFIAFIGLKNAGIW 154 

Query: 181 GLLFVDANGGWPTISSFTDSGVLLAIFGLLLTTALVIRNFRGAILIGIVATTLVGIPLG 240 

++ ++ + LLA FGL++T ++R +G I G++ T +VG+ G 

Sbjct: 155 SDFATAVSLGHILNGPTLLACFGLIVTVLFMVRGIQGGIFYGMILTAIVGLISG 208 
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Query: 241 IVDVSNLNFGISHIGEAWTELGTTFLAAFD-GLSSLFSDSSRLPLVFMTIFAFSLSDTFD 299 

1+ + I L TF AF+ ++ +FS + + F D FD 

Sbjct: 209 IITYTG GGIVSTPPSLAPTFGQAFNIQMftDVFSVQ FLIWLTFLFVDFFD 258 

Query: 300 TIGTFIGTGRRTGIFSQDDENALENSIGFSSKMDRALFADAIGTSIGALVGTSNTTTYVE 359 

T GT G + G F +D++ + +AL AD+ TSIGA++GTS TT Y+E 

Sbjct: 259 TAGTLYGVANQAG - F I KDNK LPRAGKALLADSSATSIGAILGTSTTTAYIE 308 

Query: 360 SAAGIAEGGRTGLTAVSTAVCFLLSILLLPLVGIVPAAATAPALIIVGVMMVSSFLDVNW 419 

S+AG+A GGRTG ++ TA F+L++ PL+ +V TA ALI+VG++M SS ++W 
Sbjct: 309 SSAGVAAGGRTGFASIVTAGLFVLAMFFSPLLSWTEQVTA&ALIVVGILMASSLRFIDW 368 

Query: 420 SKFADALPAFFAAFFMALCYSISYGIAAAFIFYCLVKWEGKTKDIHPIIWGATFLFIVN 479 

+K A+P+F M L YSI+ GIA F+FY + +V+G+ K++HPI++ F+F+ 

Sbjct: 369 TKLEIAIPSFLTWAMPLTYSIATGIAFGFLFYPITMIVKGRGKEVHPIMYALFFVFLAY 428 

Query: 480 FIILT 484 
FI L+ 

Sbjct: 429 FIFLS 433 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 258/488 (52%) , Positives = 336/488 (67%) , Gaps = 17/488 (3%) 

Query: 1 MEKFFKLKEHGTTIRTEITAGLTTFFAMSYILFVNPAILSQTGMPAQGVFLATIIGAWA 60 

MEKFFKL E+GTT+ TEI AGLTTFFAMSYILFVNP+IL GMP+ VFLATII A ++ 
Sbjct: 1 MEKFFKLSENGTTVSTEIMAGLTTFFAMSYILFVNPSILGA&GMPSNAVFLATIIAAAIS 60 

Query: 61 TSV^FYANLPYAQAPGMGLNAFFTYTVVFAIjGYTWQEALAMVFICGLISLIITLTKVRK 120 

T +M +AN+PYA APGMGLNAFFTYTWFAL ++WQEALAMVFICGL ++ IT+TK RK 
Sbjct: 61 TLIMGLFAWPYAIAPGMGimFFTYTWFALRFSWQEALAMWICGLFNIFITVTKFRK 120 

Query: 121 MIIESIPTTLKSAITAGIGTFLAYVGIKNAGFLKFSIDPGTYDW --GKGAAK 171 

II++IP +L+ AI GIG F+AY+G KNA + FSI +V K A 

Sbjct: 121 SIIKAIPVSLQHAIGGGIGVFVAYLGFKNANIITFSISAENIvMVNGVEPAKASAKTFAD 180 

Query: 172 GLATITANSSATPGLVSFDNPAILLSLIGLSITIFFIVKGIRGGIILSILTTTLLGILMG 231 

GL + AN P + SF + +LL++ GL +T +++ RG I++ 1+ TTL+GI +G 
Sbjct: 181 GLLFVDANGGWPTISSFTDSGVLLAIFGLLLTTALVIRNFRGAILIGIVATTLVGIPLG 240 

Query: 232 WKLDAINWEATNLSAS FRDLKQVFG VALGEKGLI SLFSNPSRLPS VLMAI LAFSLTD I F 291 

+V + +N+ +++ ++ +L FA GL SLFS+ SRLP V M I AFSL+D F 
Sbjct: 241 IVDVSNLNFGISHIGEAWTELGTTFIAAF- -DGLSSLFSDSSRLPLVFMTIFAFSLSDTF 298 

Query: 292 DTIGTLIGTGEKVGILATTGDN HESKSLDKALYSDLIGTTFGAICGTSNVTTYV 345 

DTIGT IGTG + GI + +N S +D+AL++D IGT+ GA+ GTSN TTYV 

Sbjct: 299 DTIGTFIGTGRRTGIFSQDDENALENSIGFSSKMDRALFADAIGTSIGALVGTSNTTTYV 358 

Query: 346 ESAAGIGAGGRTGLTALWAGLFAISSFFSPLVSIVPSQATAPILVIVGIMMLSNLKDIK 405 

ESAAGI GGRTGLTA+ A F +S PLV IVP+ ATAP L+IVG+MM+S+ D+ 
Sbjct: 359 ESAAGIAEGGRTGLTAVSTAVCFLLSILLLPLVGIVPAAATAPALIIVGVMMVSSFLDVN 418 

Query: 406 WDDMSFAIPAFFTSLFMGFTYSITYGIAAGFLTYTLAKVIKGQAKDIHVVLWILDILFIL 465 

W ++A+PAFF + FM YSI+YGIAA F+ Y L KV++G+ KDIH ++W LFI+ 
Sbjct: 419 WSKFADALPAFFAAFFMALCYSISYGIAAAFIFYCLVKVVEGlCrKDIHPIIWGATFLFIV 478 

Query: 466 NFISLAIL 473 

NFI L IL 
Sbjct: 479 NFIILTIL 486 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 1111 

A DNA sequence (GBSxll86) was identified in S.agalactiae <SEQ ID 3435> which encodes the amino 
acid sequence <SEQ ID 3436>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3221 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 



15 



>GP:BAB04264 GB:AP001508 unknown conserved protein [Bacillus halodurans] 
Identities = 68/147 (46%) , Positives = 100/147 (67%) , Gaps = 1/147 (0%) 

Query: 27 MFYTQNEEEL1ALGQKLGTVLKSGDI VLLTGNLGAGKTTLTKGIAKGLDIKQM1KSPTYT 86 

M TQ+ E +A QKL L +GD++ L G+LGAGKT+ TKG+A GL IK+++KSPT+T 
Sbjct: 5 I#lITQSPEATmFAQKIADKLlAGDVITLEGDLGAGKTSFTKGLALGLGIKRvVKSPTFT 64 

20 Query: 87 IVREYEGRVPLYHLDvYRIGDDPDSIDLDDFLFGQGVTVIEWGELLSDNLINNYLEIVIT 146 

I+REY+GR+PLYH+DVYR+ ++ + + D++ G GVTV+EW L+ h L I IT 
Sbjct: 65 IIREYKGRLPLYHMDVYRLNEEEEDLGFDEYFHGDGVTWEWASLIEGRLPPVRIiAITIT 124 

Query: 147 RSNQG - RQVQLEAYGHRAREI IEAI QD 172 
25 + + RQ+ AYG R E+++ + D 

Sbjct: 125 HAGENERQLSFTAYGERWEEVLKELLD 151 

A related DNA sequence was identified in S. pyogenes <SEQ ID 343 7> which encodes the amino acid 
sequence <SEQ ID 343 8>. Analysis of this protein sequence reveals the following: 

30 ■ Possible site: 43 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1202 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/142 (68%) , Positives = 122/142 (85%) 

40 

Query: 27 MFYTQNEEELIALGQKLGTVLKSGDI VLLTGNLGAGKTTLTKGIAKGLDIKQMIKSPTYT 86 

MFY++NE L A G+ LGT L GD+++L+G+LGAGKTTL KGIAKG+ I QMIKSPTYT 
Sbjct: 1 MFYSENEYTLKAYGETLGTYLSIGDVIVLSGDLGAGKTTLAKGIAKGMGISQMIKSPTYT 60 

45 Query: 87 IvREYEGRVPLYHLDVYRIGDDPDSIDLDDFLFGQGVTVIEWGELLSDNLINNYLEIVIT 146 

IVREYEGR+PLYHLD+YR+GDDPDSIDLDDFLFG GVTVTEWGELL + L+ +YL+I IT 
Sbjct: 61 IVREYEGRLPLYHLDIYRVGDDPDSIDLDDFLFGNGVTVIEWGELLGEGLLQDYLQITIT 120 

Query: 147 RSNQGRQVQLEAYGHRAREIIE 168 
50 + ++GRQ+ L A+G R+R+++E 

Sbjct: 121 KRDKGRQLDLLAHGERSRQLLE 142 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

55 Example 1112 

A DNA sequence (GBSxll87) was identified in S.agalactiae <SEQ ID 3439> which encodes the amino 
acid sequence <SEQ ID 3440>. Analysis of this protein sequence reveals the following: 
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Possible site: 58 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1782 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35662 GB:AE001732 conserved hypothetical protein [Thermotoga maritima] 
Identities = 56/163 (34%) , Positives = 94/163 (57%) , Gaps = 1/163 (0%) 

Query: 24 E1ASREEASAILEFLNTOTEETDFILHTVSNQLSLSEMETFIENTLMTKNCICLIAKLKNK 83 

EAS +A I+E+L VT ETDF++ +S +1 + ++ ++ + 

Sbjct: 18 EASIWDARRIVEYLKEVTSETDFLITRPDEVyDVSTERNYIRMYRSNPGKLMIVGEINRE 77 

Query: 84 VIGLITIISQSDIEIEHVGDLFIAVQKDYWGYGIGHILMEEAIEWASDNDITRRLELSVQ 143 

++ L+T +HVG++ I+V+K YW GIG ++ AIEWA N R++L V 

Sbjct: 78 IVSLLTFTGFGRKRTKHVGEIGISVKKRYWNIGIGTRMITSAIEWARRNGFI-RIQLEVL 136 

Query: 144 GRNERAIHLYQKFGFEIDGLQTRGIKRENGEFLDIYRMSKLID 186 

NERAI LY+K GFE++G++ + ++R++G F D+ M+ L+D 
Sbjct: 137 KSNERAISLYRKLGFELEGIKRKAVRRDDGSFEDVLVMALLLD 179 

There is also homology to SEQ ID 1724. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1113 

A DNA sequence (GBSxll88) was identified in S.agalactiae <SEQ ID 3441> which encodes the amino 
acid sequence <SEQ ID 3442>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have a cleavable N-term signal seq. 



Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15582 GB:Z99122 membrane -bound protein [Bacillus subtilis] 
Identities = 108/324 (33%) , Positives = 178/324 (54%) , Gaps = 33/324 (10%) 

Query: 5 KKITLMFSAIILTTVIALGV- -YVASAYNFSTNELSKTFKDFKLAKS- -KSHAIEETKPF 60 

KK TL+ + + + ++ LG Y ++ + + ++ + +K K +1 + PF 
Sbjct: 8 KKKTLLLTILTIIGLLVLGTGGYAYYLWHKARSTVASIHESIDKSKKRDKEVSINKKDPF 67 

Query: 61 SILLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSLERDVLIKLSGPKNNGQTG 120 

S+L+MGVD + G +D++I +T+NPKTN T M S+ RD K+ G G 
Sbjct: 68 SVLIMGVDERDGDK GRADTLiyMTVNPKTNTTDMVS I PRDTYTKI IGK G 116 

Query: 121 VFAKLNAAYASGGAEMALMTVQDLLDIIWDYFM^ 180 

K+N +YA GG +M + TV++ LD+ VDYF+++NM+ D+V+ +GGITV + F F 
Sbjct: 117 TMDKINHSYAFGGTQMTVDTVENFLDVPVDYFTO7NMESFRDVVDTLGGITVNSTFAFSY 176 

Query: 181 S IAANEPEYKAWEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVI QKVLKKI LAL 240 

+ G +NG++AL Y+RMR +DP GD+GRQ RQR+VIQ ++ K + 

Sbjct: 177 DGYS FGKGEITLNGKEALAYTRMRKEDPRGDFGRQDRQRQVIQGIINKGANI 228 



Query: 241 NSISSYKKILSAVSNNMQTNIEISSKTIPNL LAYKDSLEHI KSYQLKGEDATLSDG 296 
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+SI+ + + V NN++TN+ T N+ YK + +HIK ++LKG T +G 

Sbjct: 229 SSITKFGDMFKWENNVKTNL TFDNMWDIQSDYKGARKHI KQHELKG - TGTKING 282 

Query: 297 GS YQI LTKKHLLAVQNRI KKELDK 320 
5 Y + L + +K+ L+K 

Sbjct: 283 IYYYQADESALSDITKELKESLEK 306 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2763> which encodes the amino acid 
sequence <SEQ ID 2764>. Analysis of this protein sequence reveals the following: 

10 Possible site: 33 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 288/436 (66%) , Positives = 342/436 (78%) , Gaps = 22/436 (5%) 

20 

Query: 1 MKIWKKITLMFSAIILTTVIALGVYVASAYNFSTNELSKTFKDFKLAKSKSHAIEETKPF 60 

MKI KKI LMF+AI +LTTV+ALG VY+ SAY FST ELSKTFKDF + +KS AI++T+ F 
Sbjct: 1 MKIGKKIVLMFTAI VLTTVLALGVYLTSAYTFSTGELSKTFKDFSTSSNKSDAIKQTRAF 60 

25 Query: 61 S1LLMGVDTGSEHRKSKWSGNSDSMILVTINPKTNKTTMTSLERDVLIKLSGPKNNGQTG 120 

SILLMGVDTGS R SKW GNSDSMILVT+NPKT KTTMTSLERD L LSGPKNN G 
Sbjct: 61 SILLMGVDTGSSERASKWEGNSDSMILVTVNPKTKKTTMTSLERDTLTTLSGPKNNEMNG 120 

Query: 121 VEAKLNAAYASGGAEMALMTVQDLLDINTOYFMQIIWQGL^ 180 
30 VEAKLNAAYA+GGA+MA+MTVQDLL+I +D ++QINMQGL+DLVNAVGGITVTN+FDFPI 

Sbjct: 121 VEAKLNAAYAAGGAQMAIMWQDLMITIDNYVQ^ 180 

Query: 181 SIAANEPEYKAVVEPGTHKINGEQALVYSRMRYDDPEGDYGRQKRQREVIQKVLKKILAL 240 
SIA NEPEY+A V PGTHKINGEQALVY+RMRYDDPEGDYGRQKRQREVIQKVLKKIIAL 
35 Sbjct: 181 SIAENEPEYQATVAPGTHK1NGEQALVYARMRYDDPEGDYGRQKRQREVIQKVLKKILAL 240 

Query: 241 NSISSYKKILSAVSNNMQTNIEISSKTIPNLLAYKDSLEHIKSYQLKGEDATLSDGGSYQ 300 

+SISSY+KILSAVS+NMQTNIEISS+TIP+LL Y+D+L IK+YQLKGEDATLSDGGSYQ 
Sbjct: 241 DSISSYRKILSAVSSNMQTNIEISSRTIPSLLGYRDALRTIKTYQLKGEDATLSDGGSYQ 300 

40 

Query: 301 ILTKKHLLAVQNRIKKELDKKRSKTLKTSAILYEDYYGTTASNDSSTYSSTQENNYNTT- 359 

I+T HLL +QNRI+ EL + LKT+A +YE+ YG ST S T NNY+++ 

Sbjct: 301 IVTSNHLLEIQNRIRTELGLHKVNQLKTNATVYENLYG STKSQTVNNNYDSSG 353 

45 Query: 360 ---PYSEAPPSYSG NTTYSSETNQTTHQNYYNSSTPASNYSSNTNTGQADSSGSV 411 

YS++ SY+ +T S+ T+Q + + + +TP+S+ S ++ SSGS 

Sbjct: 354 QAPSYSDSHSSYANYSSGVDTGQSASTDQDSTASSHRPATPSSS-SDALAADESSSSGS- 411 

Query: 412 NNHNGAATPNPNTGTQ 427 
50 G+ P N Q 

Sbjct: 412 GSLVPPANINPQ 423 

SEQ ID 3442 (GBS54) was expressed in E.coli as a ffis-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 16 (lane 8; MW 48.4kDa). 

55 The GBS54-His fusion product was purified (Figure 98A; see also Figure 194, lane 6) and used to immunise 
mice (lane 1+2 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 98B), 
FACS (Figure 98C), and in the in vivo passive protection assay (Table III). These tests confirm that the 
protein is immunoaccessible on GBS and that it is an effective protective immunogen. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1114 

A DNA sequence (GBSxll89) was identified in S.agalactiae <SEQ ID 3443> which encodes the amino 
acid sequence <SEQ ID 3444>. This protein is predicted to be Vesl-IL. Analysis of this protein sequence 
reveals the following: 

Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.44 Transmembrane 3 - 19 ( 3 - 19) 

Final Results 

bacterial membrane Certainty=0 . 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3445> which encodes the amino acid 
sequence <SEQ ID 3446>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below. 

Identities - 42/98 (42%) , Positives = 64/98 (64%) 

Query: 1 MKIGRLIALGLVSLGALELYKNRKTIKDSYQNTKNETDSAKLKLERIKNDLAIISQEKEK 60 

MK+ +IA+GL+S A + Y+ R TIK+ ++ D+A+L L+ IK +L +1 + + 
Sbjct: 1 MKVKTVIAVGLLS FTAYKAYQKRCT I KELLS I SRQAKDAAQLDLDNI KANLDLIHSQGKV 60 

Query: 61 IRLISQELNHKFQVFNKDIQPRLEEINQRMAKYQEKDE 98 

1+ ISQ+L KK++ FN++ Q L EI RMAKYQE E 
Sbjct: 61 IQNISQDLAHKWRYFNQETQAHLTEIQNRMAKYQEDSE 98 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1115 

A DNA sequence (GBSxll90) was identified in S.agalactiae <SEQ ID 3447> which encodes the amino 
acid sequence <SEQ ID 3448>. This protein is predicted to be Hit-like protein involved in cell-cycle 
regulation (hit). Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2694 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAB04908 GB :AP001511 Hit-like protein involved in cell-cycle 
regulation [Bacillus halodurans] 
Identities = 74/137 (54%) , Positives = 95/137 (69%) , Gaps ■ 2/137 (1%) 

Query: 3 NCI FCKI I SGEIPSSKWEDDEVLAFLDITQTTTGHTLLI PKKHVRNVLEMDEKTAQITF 62 

NCIFCKII+GEIPS+ VYEDD V AFLDI+Q T GHTL+IPK H RNV E+ E+ A F 
Sbjct: 6 NCIFCKI IAGEIPSATVYEDDHVYAFLDISQVTKGHTLVIPKVHKRNVFELSEEIASSLF 65 

Query: 63 ERLPKVAI^VQAATKAKGMNIINNNEEIAGQTvFHAHvHLVPRFDESDGIKIHYTTHEPD 122 

+PK++RA+ A + GMNI+NNN E AGQTVFH H+HL+PR+ E DG + H 
Sbjct: 66 AAVPKISRAINDAFQPIGMNIVNNNGEARGQTVFHYHLHLLPRYGEGDGYGAVWKDHSSQ 125 

Query: 123 F--EALAKLAKEIRKEI 137 

+ + L L+ IR+ + 
Sbjct: 126 YSGDDLQVLSSSIREHL 142 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3449> which encodes the amino acid 
sequence <SEQ ID 3450>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0125 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 97/137 (70%) , Positives = 117/137 (84%) 

Query: 1 MDNCI FCKI I SGEI PSSKVYEDDE VIAFLDITQTTTGHTLLIPKKHVRNVLEMDEKTAQI 60 

M+NCIFC II G+IPSSKVYED++VLAFLDI+QTT GHTL+IPK+HVRN+LEM +TA 
Sbjct: 1 MENCIFCSIIQGDIPSSKVYEDEQVIAFIjDISCjTTKGHTLVIPKQHVRNLLEMTAETASH 60 

Query: 61 TFERLPKVARAVQAATKAKGMNI INNNEE IAGQTVFHAHVHLVPRFDESDGI KIHYTTHE 120 

F R+PK+ARA+Q+AT A MNIINNNE +AGQTVFHAHVHLVPR++E DGI I YTTHE 
Sbjct: 61 LFARI PKIARAIQSATGATAMNI INNNEAIAGQTVFHAHVHLVPRYNEEDGI SIQYTTHE 120 

Query: 121 PDFEALAKLAKEIRKEI 137 

PDF L KLA++I +E+ 
Sbjct: 121 PDFPVLEKLARQINQEV 137 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1116 

A DNA sequence (GBSxll91) was identified in S.agalactiae <SEQ ID 3451> which encodes the amino 
acid sequence <SEQ ID 3452>. Analysis of this protein sequence reveals the following: 

Possible site: 36 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3 000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty^ 0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10923> which encodes amino acid sequence <SEQ ID 
10924> was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 3452 (GBS87) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 8 (lane 3; MW 19.5kDa). It was also expressed in E.coli as a GST-fusion 
5 product. SDS-PAGE analysis of total cell extract is shown in Figure 15 (lane 10; MW 44kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1117 

A DNA sequence (GBSxll92) was identified in S.agalactiae <SEQ ID 3453> which encodes the amino 
10 acid sequence <SEQ ID 3454>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 



15 



20 



Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.53 Transmembrane 143 - 159 ( 141 - 161) 



Final Results 

bacterial membrane Certainty=0 . 3612 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9563> which encodes amino acid sequence <SEQ ID 9564> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12844 GB:Z99109 ABC transporter (ATP-binding protein) 
25 [Bacillus subtilis] 

Identities = 137/242 (56%) , Positives = 181/242 (74%) 

Query: 1 MTMLKIENVTGGYVNIPVLKNISFEVNDGELVGLIGLNGAGKSTTINEIIGILRPYQGDI 60 
M++L ++++TGGY PVLKN+SF + ++VGLIGLNGAGKSTTI IIG++ P++G I 
30 Sbjct: 1 MSLLSVKDLTGGYTRNPVLKNVSFTLEPNQIVGLIGLNGAGKSTTIRHIIGLMDPHKGSI 60 

Query: 61 TIDGISLEADQELYRKKIGFIPETPSLYEELTLREHLETVAMAYDIATDEVMARAQKLLE 120 

++G + D E YR + +IPETP LYEELTL EHLE AMAY ++ + + R LL+ 
Sbjct: 61 ELNGKTFAEDPEGYRSQFTYIPETPVLYEELTLMEHLELTAMAYGLSKETMEKRLPPLLK 120 

35 

Query: 121 MFRLTDKLDWFPMHFSKGMKQKVMIICAFWSPSLFIVDEPFLGLDPLAISDLINLLAEE 180 

FR+ +L WFP HFSKGMKQKVMI +CAF+ P+L+I+DEPFLGLDPLAI+ L+ + E 
Sbjct: 121 EFRMEKRLKWFPAHFSKGMKQKVMIMCAFIJffiPALYIIDEPFLGLDPliAINALLERMNEA 180 

40 Query: 181 KAKGKSILMSTHVLDSAEKMCDRFVILHKGEIRAVGTLEELRAIFGDSNANLNDIYIALT 240 

K G S+LMSTH+L +AE+ CD F+ILH GE+RA GTL ELR FG +A L+D+Y+ LT 
Sbjct: 181 KKGGASVLMSTHILATAERYCDSFIILHNGEVRARGTLSELREQFGMKDAALDDLYLELT 240 

Query: 241 KE 242 
45 KE 

Sbjct: 241 KE 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 3455> which encodes the amino acid 
sequence <SEQ ID 3456>. Analysis of this protein sequence reveals the following: 

50 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -5.04 Transmembrane 141 - 157 { 139 - 158) 
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Final Results 



bacterial membrane 
bacterial outside 
bacterial cytoplasm 



Certainty=0. 3017 (Affirmative) < suco 
Certainty= 0.0000 (Not Clear) < suco 
Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB12844 GB:Z99109 ABC transporter (ATP-binding protein) 
[Bacillus subtilis] 
Identities = 139/241 (57%) , Positives = 189/241 (77%) 

Query: 1 MLNI KNLTGGYHNI PVLNDVS FSVDNGELVGLIGLNGAGKSTT INE I IGFLKPYQGS I S I 60 

+L++K+LTGGY PVL +VSF+++ ++VGLIGLNGAGKSTTI IIG + P++GSI + 
Sbjct: 3 LLSVKDLTGGYTRNPVLKNVSFTLEPNQIVGLIGLNGAGKSTTIRHIIGLMDPHKGSIEL 62 

Query: 61 DGLTLAENAVAYRQKIGFIPETPSLYEELTLSEHINTVAMAYDIDLEVAQKRAQPFLEMF 120 

+G T AE+ YR + +IPETP LYEELTL EH+ AMAY + E +KR P Ii+ F 
Sbjct: 63 NGKTFAEDPEGYRSQFTYIPETPVLYEELTLMEHLELTAMAYGLSKETMEKRLPPLLKEF 122 

Query: 121 RLTDKLEWFPVNFSKGMKQKVMIICAFVIDPSLFILDEPFLGLDPIAISDLIQTLEVEKA 180 

R+ +L+WFP +FSKGMKQKVMI+CAF+ +P+L+I+DEPFLGLDPLAI+ L++ + K 
Sbjct: 123 RMEKRLKWFPAHFSKGMKQKVMIMCAFLAEPALYI1DEPFLGLDPLAINALLERMNEAKK 182 

Query: 181 KGKSILMSTHVLDSAERMCDRFVILHHGQVRAQGTIADLQEAFGDRSASLNDIYLALTKED 241 

G S+LMSTH+L +AER CD F+ILH+G+VRA+GTL++L+E FG + A+L+D+YL LTKED 
Sbjct: 183 GGASVLMSTHILATAERYCDSFIILHNGEVRARGTLSELREQFGMKDAALDDLYLELTKED 243 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 181/240 (75%) , Positives = 208/240 (86%) 



Query: 


3 


MLKIENVTGGYTOIPVIjKNISFEVNDGELVGLIGLNGAGKSTTINEIIGILRPYQGDITI 


62 






ML I+N+TGGY NIPVL ++SF V++GELVGLIGLNGAGKSTTINEIIG L+PYQG I+I 




Sb j ct : 


1 


MI^IKNLTGGYHNIPVLNDVSFSVDNGELVGLIGIiNGAGKSTTINEIIGFLKPYQGSISI 


60 


Query: 


63 


DGISLFADQELYRKKIGFIPETPSLYEELTLREHLETVAMAYDIATDEVMARAQKLLEMF 


122 






DG++L + YR+KIGFIPETPSLYEELTL EH+ TVAMAYDI + RAQ LEMF 




Sb j ct : 


61 


DGLTLAENAVAYRQKIGFIPETPSLYEELTLSEHINTVAMAYDIDLEVAQKRAQPFLEMF 


120 


Query: 


123 


RLTDKLDWFPMHFSKGMKQKVMI I CAFWSPSLFIVDEPFLGLDPLAI SDLINLLAEEKA 


182 






RLTDKL+WFP++FSKGMKQKVMIICAFV+ PSLFI+DEPFLGLDPLAI SDLI L EKA 




Sb j ct : 


121 


RLTDKLEWFPVNFSKGMKQKVMIICAFVIDPSLFILDEPFLGLDPLAISDLIQTLEVEKA 


180 


Query: 


183 


KGKSILMSTHVLDSAEKMCDRFVILHKGEIRAVGTLEELRAIFGDSNANLNDIYIALTKE 


242 






KGKS I LMSTHVLDSAE+MCDRFVI LH G++RA GTL +L+ FGD +A+LNDIY+ALTKE 




Sb j ct : 


181 


KGKSILMSTHVLDSAERMCDRFVILHHGQ\TRAQGTLADLQEAFGDRSASLM3IYLALTKE 


240 



SEQ ID 3454 (GBS353) was expressed in E.coh as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 74 (lane 2; MW 30kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 6; MW 55kDa). 

GBS353-GST was purified as shown in Figure 216, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 1118 

A DNA sequence (GBSxll93) was identified in S.agalactiae <SEQ ID 3457> which encodes the amino 
acid sequence <SEQ ID 345 8>. Analysis of this protein sequence reveals the following: 

Possible site: 28 

>» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 1475 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 1119 

A DNA sequence (GBSxll94) was identified in S.agalactiae <SEQ ID 3459> which encodes the amino 
acid sequence <SEQ ID 3460>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0. 6074 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB12845 GB:Z99109 ABC transporter (membrane protein) [Bacillus subtilis] 
Identities = 101/409 (24%) , Positives = 187/409 (45%) , Gaps = 76/409 (18%) 

Query: 1 MKKLFNKRRSLFLTQNSKYLRYVFNDHFVLVLMFLSGFLLYQYSQLLKDFPKTHWPIIVI 60 

M ++ R + + Y++Y+ NDH V+VL+F YS+ ++D P H+P + 

Sbjct: 4 MLDIWQSRLQEHIKETRTYMKYMIiNDHLVIVLIFFLAGAASWYSKWIRDIP-AHFPSFWV 62 

Query: 61 VSIIILMLLAMGGIASYLEPADKQFLLIKEEAIKEIINSAKKRTYI 106 

++++ ++L + + L+ AD FLL E ++ + A +Y+ 
Sbjct: 63 MAvLFSLVTjTSSYTOTLLKFADLVFLLPLFJVKMEPYLKQAFVYSYVSQLFPLIALSIVAM 122 

Query: 107 --FWLVIQTLFLVLISPILIKLGL 128 

++ V LV + + ++L L 

Sbjct: 123 PLYFAVTPGASLVSYAAVFVQLLLLKAWNQVMEWRTTFQNDRSMKRMDVIIRFAANTLVL 182 

Query: 129 SVFMITLLIFGLGIIKWLVITYKVKVFYNNQNLNWDAAINHEQERKQSILKFFSL 183 

SV+M LL++ + + +L ++ K + W++ I E RKQ + +L 

Sbjct: 183 YFVFQSVYMYALLVYVIMAVLYLYMSSAAK RKTFKWESHIESELRRKQRFYRIANL 238 

Query: 184 FTNVXGISTSVKRRSFLDGILKLISKTPSRLWTNLFvRAFLRSSDYLGLTIRLOTLNILS 243 

FT+V + KRR++LD +L+L+ + + +F RAFLRSSDYLG+ +RL + L 

Sbjct: 239 FTDVPHLRKQAKRRAYLDFLLRLVPFEQRKTFAYMFTRAFLRSSDYLGILVRLTIVFALI 298 

Query: 244 VI FVNETYLALALAFVFN- YLLLFQLLALGHHFDYQYMNQLYP VRLNAKASQLKGFLR VL 302 

+++V+ + L A+ VF ++ QLL L HFD+ + +LYPV+ K ++LK + +L 
Sbjct: 299 IMYVSASPLIAAVLTVFAIFITGIQLLPLFGHFDHLALQELYPVQ KETKLKSYFSLL 355 

Query: 303 SYAVTVIDSI LIRELKPVILLIVLMLIVTEYYIPYKIKK 341 

A+++ + L L +1 VL+ +V Y+ ++KK 
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Sbjct: 356 KTALS IQALLMSVASAYAAGLTGFLYAL I GSAVL I FWLPAYMTTRLKK 404 

A related DNA sequence was identified in S.pyogenes <SEQ ID 346 1> which encodes the amino 
sequence <SEQ ID 3462>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
>>> Seems to have no N-terminal signal sequence 
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Final Results 

bacterial membrane Certainty=0 . 6965 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear). < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB12845 GB:Z99109 ABC transporter (membrane protein) [Bacillus subtilis] 
Identities = 96/403 (23%) , Positives = 173/403 (42%) , Gaps = 78/403 (19%) 

Query: 1 MKALFLKRRQDFQKQQNKYLRYVLNDHFVLVLMFLLGFAMVQYGQLLN HFPT 52 

M ++ R Q+ K+ Y++Y+LNDH V+VL+F LA Y + + HFP+ 
Sbjct: 4 MLD I WQSRLQEH I KETRT YMKYMLNDHLVI VLI FFLAGAASWYSKWI RDI PAHFPS FWVM 63 

Query: 53 NHLP I QVCLGILIPLLLSM 71 

L + L L+PL M 

Sbjct: 64 AvLFSLVLTSSYWTLLKEADLVFLLPLEAKMEPYLKQAFVYSYVSQLFPLIALSIVAMP 123 

Query: 72 --GSIATYLEEADQHFLLPKEEEVISYI KQAERLSFLLWGTLQTAVLL 117 

S+ +Y Q LL +V+ + + +R+ ++ T VL 

Sbjct: 124 LYFAVTPGASLVSYAAVFVQLLLLKAWNQVMEWRTTFQNDRSMKRMDVIIRFAANTLVLY 183 

Query: 118 FLYPI FRRLGLSLFI FI ILVLILLALKRWLSRKTRYFLRGNRLDWAKAVAFESNRKQS I 177 

F++ S++++ +LV +++A+ + +S + W + E RKQ 

Sbjct: 184 FVFQ SVYMYALLVYVIMAVLYLYMSSAAKR KTFKWESHIESELRRKQRF 232 

Query: 178 LKFYSLFTTVKGISTKVKERTYLNPLLKLVKQTPSNLWLSLYARAFLRSSDYLGLFLRLM 237 

+ +LFT V + + K R YL+ LL+LV + ++ RAFLRSSDYLG+ +RL 

Sbjct: 233 YRIANLFTDVPHLRKQAKRRAYLDFLLRLVPFEQRKTFAYMFTRAFLRSSDYLGILVRLT 292 

Query: 238 LLSSLSVFFIHNLYLSVSIJUjIFN-YLvWQLLSLYYHYDYHYMTSLYPENSRSKKKNML 296 

++ +L + ++ L ++ +F ++ QLL L+ H+D+ + LYP +K K+ 
Sbjct: 293 IVFALIIMYVSASPLIAAVLTVFAIFITGIQLLPLFGHFDHLALQELYPVQKETKLKSYF 352 

Query: 297 SFLR-GLSFLMLI VNMLCCSSAPKA- -LILIVGMVFIACIYLP 336 

S L+ LS L++++ +A L ++G + + LP 

Sbjct: 353 SLLKTALS IQALLMSVASAYAAGLTGFLYALIGSAVLI FWLP 395 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 170/344 (49%) , Positives = 237/344 (68%) 

1 Query: 1 MKKLFNKRRSLFLTQNSKYLRWFNDHFVLVLMFLSGFLLYQYSQLLKDFPKTHWPII VI 60 
MK LF KRR F Q +KYLRYV NDHFVLVLMFL GF + QY QLL FP H PI V 
Sbjct: 1 MKALFLKRRQDFQKQQNKYLRYVLNDHFVLVLMFLLGFAMVQYGQLLNHF 60 

Query: 61 VSIIILMLLAMGGIASYLEPADKQFLLIKEEAIKEIINSAKKRTYIFWLVIQTLFLVLIS 120 

+ I+I +LL+MG IA+YLE AD+ FLL KEE + I A++ +++ W +QT L+ + 
Sbjct: 61 LGILIPLLLSMGSIATYLEEADQHFLLPKEEEVISYIKQAERLSFLLWGTLQTAVLLFLY 120 



Query: 121 PILIKLGLSVFMITLLIFGLGIIKWLVITyKVKVFYNNQNLNWDAAINHEQERKQSILKF 180 



